Planet Musings

May 15, 2024

Matt Strassler The Standard Model More Deeply: The Electron and its Cousins (Part 2)

From the structure of the Standard Model of particle physics, one might wonder if the electron, muon and tau, so similar except for their masses, might really be the same object seen in three different guises. Last week, starting with a post for general readers, and then looking at the situation in more detail (Part 1 of this two-part series), I showed one way in which this idea fails to agree with experiment. Today I’ll give you another point of failure, focused on a property of particles known as “spin”.

Spin: The Idea In Brief

What is “spin” in physics? It’s related to the word “spin” in English, but with some adjustment. I’ll write a longer article about spin elsewhere, but here’s a brief introduction.

The Earth “spins”, both in the English sense and in the physics sense: it rotates. The same is true of a tennis ball. The rotation can be changed when the ball interacts with a tennis racket or with the ground. It can also be changed when it interacts with another tennis ball. In fact, if two balls strike each other, some of the spin of one ball can be transferred to the other, as in Fig. 1.

Figure 1: (Left) Two balls, the upper one spinning, approach and collide; (Right) The balls recoil from the collision, with some of the spin of the upper ball now transferred to the lower ball.

Elementary “particles” can have spin too, which can be (in part) changed by or transferred to other objects that they interact with. But this type of spin is somewhat different from ordinary rotation.

For example, an electron “spins”. Always. An interaction can change the direction of an electron’s spin, but it cannot change its overall amount. I’ll tell you what I mean by “amount” in a moment, but the quantity of the amount is a famous number:

  • h / 4π = ℏ / 2

where h, Planck’s constant, shows up whenever quantum physics matters. This amount of spin is tiny; if a tennis ball had this amount of spin, it wouldn’t have even rotated once since the universe was born. But for an electron, it’s quite a lot.

Usually, as shorthand, one takes the ℏ to be implicit, and just says that “the electron has spin 1/2”. I’ll do that in what follows.

How can we visualize this spinning? It’s not easy. It’s best not to visualize an electron as a small rotating ball; it’s the wrong picture. Quantum field theory, the modern theory of “particles”, gives a different picture. Just as we should think of light as a wave made of wave-like photons, we should think of the electron as a wave — specifically, a wave in the electron field. [This wave is not to be confused with a wave function, which is something else]. A wave in the electron field can rotate in ways that a little ball cannot. This, however, is hard to draw, and in any case is a story for another day.

The important point is that the spin of an elementary “particle” is not like the simple rotational spinning of a tennis ball, even though it is in the same category. An electron has spin intrinsically, by its very nature; there is no electron without its spin, just as there is no electron without its internal energy and rest mass (as emphasized in my recent book, Chapter 17). That’s certainly not true for a tennis ball, which can have any spin, including none.

These details won’t matter much in what follows, though, so let’s step away from these subtleties and move on.

Ways to Spin

So far, I’ve suggested two types of spinning:

  • the intrinsic spinning (or “intrinsic angular momentum”) of elementary particles
  • the ordinary spinning (or “ordinary angular momentum”) of objects that are rotating, or are in orbit around other objects.

Both contribute to the surprising way that physicists use the word “spin”.

Suppose we have an object that is made of multiple elementary particles. Its total angular momentum involves combining the intrinsic angular momentum of its elementary particles with the ordinary angular momentum that the particles may have as they move around each other. Physicists now do something unexpected: they refer to the object’s total angular momentum as its spin. They do this even though the object is not elementary, meaning that its spin may potentially combine both intrinsic spinning and ordinary spinning of the objects inside it.

So the real meaning of spin, as particle physicists use the term, is this: it is the total angular momentum of an isolated object — an object that may be elementary, or that may itself be formed from multiple elementary objects. That means that the spin of an object like an atom, made of electrons, protons and neutrons, or of a proton, made of quarks, antiquarks and gluons, may potentially arise from multiple sources.

Atoms, Protons and Strings

For example, let’s take a hydrogen atom, made from an electron and a proton. (For starters we’ll treat both of these subatomic particles as though they were elementary. We’ll return to their possible internal structure later.)

I’ll refer in the following to four atomic states, illustrated below in Fig. 2.

  • The ground state of a hydrogen atom (known as the “1s state”) has spin 0. Nevertheless, it is made of an electron with spin 1/2 surrounding a proton of spin 1/2. The two spin in opposite directions so that their angular momentum cancels.
  • There is a very slightly excited state, often neglected in first-year physics courses or in quick summaries of atomic physics, where the electron and proton spin in the same direction, and their spins add instead of cancelling. This state of the atom has spin 1; I’ll call it the “spin-flipped 1s state”. (The transition from this excited state to the ground state involves the emission of a radio wave photon with a wavelength of 21 cm, leading to the so-called “21 cm line” widely observed in astronomy. )
  • There are more dramatically excited states known as the “2s and 2p states”. The 2s state has spin 0, while the 2p state has spin 1. But even though the 2p state and the spin-flipped state both have spin 1, their spins have different origins. The total angular momentum of the 2p state does not come from the intrinsic angular momenta of the electron and proton; those cancel out, just as they do in the 1s and 2s states. Instead, the spin of the 2p state comes from a sort of rotational motion of the electron around the proton.

These four states are sketched in Figure 2. (Spin-flipped versions of the 2s and 2p states exist but are not shown.)

Figure 2: (Bottom left) In the ground state of hydrogen, the spin of the proton (central red dot) is opposite to that of the electron (surrounding blue cloud), so that the atom has spin 0. (Bottom right) If the electron spin is flipped, both the proton and electron spin in the same sense, giving the atom spin 1. (Top) While the 2s state is similar to the 1s state, the 2p state has the electron and proton spinning in opposite directions but has the electron moving around the proton (dashed black line), giving the atom spin 1.

The fact that the ground state has spin 0, and yet the 2p state has larger spin specifically due to the electron’s motion around the proton, illustrates the main point of this post. If an object is made from multiple constituent objects, nothing can prevent those constituents from moving around one another. That means they can have ordinary (or “orbital”) angular momentum, which then contributes, along with the constituents’ spin, to the combined object’s spin — i.e., to its total angular momentum.

Therefore, an object that is not elementary, and so contains multiple objects inside it, will inevitably have excited states with different amounts of spin. Indeed, the hydrogen atom has excited states of total angular momentum 0, 1, 2, 3, and so on. All atoms exhibit similar behavior.

The same applies for protons, which aren’t elementary either. The proton has spin 1/2, but its excited states have spin 1/2, 3/2, 5/2, 7/2, and so on. The first excited state of the proton, the Delta, has spin 3/2. This is most easily understood as a rearrangement of the spins of the quarks, gluons and anti-quarks that it contains (though the exact rearrangement is not obvious, due to the complexity of a proton; see also chapter 6 of the book.) The next excited state, the p(1440), has spin 1/2 like the proton. But many other excited states have been observed, with spin 3/2, 5/2, 7/2, 9/2, and perhaps even 11/2.

A string, such as one finds in string theory, is another object with internal constituents, which one might call bits-of-string. A string can always be spun faster. That’s why, in the superstring theory that is sometimes touted as a potential “theory-of-everything” (or in less grandiose language, a complete theory of space, fields and particles), there are states of all possible spin — any integer times 1/2. Unfortunately, were this really the theory of our universe, the higher-spin states of the string would probably have far too much mass for us to make them in near-term experiments, putting this prediction of the theory out of reach for now.

But string theory isn’t just useful in this rarified context. It can also be used to describe the physics of “hadrons” — objects made from quarks and gluons, including protons. All indications from experiments and numerical calculations do indeed suggest that hadrons come in all possible spins; this includes the excited states of the proton already mentioned above. (That said, the higher the spin, the harder it is to make the states, making it more and more challenging to observe them.)

Spin and the Electron, Muon and Tau

None of this is true for electrons, muons or taus, all of which have spin 1/2. No electron-like particle with spin 3/2 has ever been observed.

This argues strongly against the electron, muon and tau all being made from the same object. Atoms, protons and strings all have excited states with the same spin as the ground state, but at roughly the same mass, they also have excited states with more spin. If the muon were an excited state of the electron, we would expect to see an object with spin 3/2 that has a mass comparable to the muon, and certainly below the mass of the tau. Such a state would easily have been observed decades ago, so it doesn’t exist.

Are there loopholes to this logic? Yes. It is possible, in special circumstances, for the excited states with higher spin to have much larger masses than the excited states which share the same spin as the ground state. This is a long story which I won’t try to tell here, but examples arises in the context of extra dimensions, and others in the related context of exotic theories of quark-like and gluon-like objects (with buzzwords such as “AdS/CFT” or “gauge/string duality”).

However, it’s hard to apply the loophole to the muon and tau. In such a scenario, the electron should have many more cousins than just two, and some of the others should have observed by now.

Furthermore, data now confirms that both the tau and muon get their rest mass from the Higgs field; see Fig. 3. For such particles, the loopholes I just mentioned don’t apply.

Figure 3: The interaction strengths of various types of elementary particles with the Higgs field, plotted versus those particles’ rest masses. Any particle whose rest mass comes entirely or largely from the Higgs field should lie on or near the dashed line. The data shows this is true of both the tau τ and the muon μ, as well as of the bottom and top quarks b, t, and of the W and Z bosons.

We must also recall the arguments given in the first part of this series. If muons and taus are excited states of electrons, it should be possible for any sufficiently energetic collision to turn an electron into a muon or tau, and for decays via photons to do the reverse. But these processes are not observed.

In short, the properties of the electron, muon and tau disfavor the idea that they are somehow secretly the same object in three different quantum states. The explanation of their similarities must lie elsewhere.

May 14, 2024

Matt Strassler Some Upcoming Speaking Events, One Tomorrow

Three events coming up!

Tomorrow, Wednesday, May 15th, in the town of Northampton, Massachusetts, I’ll be speaking about my book — specifically, about why and how the relationship between ourselves and the universe is not what it seems. The event will be held at the Broadside Bookshop at 7pm. If you’re in the Pioneer Valley, please come by! And let your friends in the area know, too.

Next, on the night of Thursday, June 6th at 6:30 pm, I’ll be at the Boston Public Library on a panel entitled “Particle Physics: Where the Universe and Humanity Collide.” The other members of the panel will be

  • Katrina Miller, PhD – a Chicago-based science reporter & essayist who has written for, among others, the New York Times and Wired magazine
  • Sarah Demers – Professor of Physics at Yale University and a frequent science communicator

This is a public event for a general audience, arranged as part of the 12th Annual Large Hadron Collider Physics Conference. This event will be live-streamed. More details to come.

Finally, I’ll also be speaking about my book on Monday, June 24th at The Bookstore in Lenox, MA. Stay tuned for details on this as well.

May 13, 2024

John BaezAgent-Based Models (Part 9)

Since May 1st, Kris Brown, Nathaniel Osgood, Xiaoyan Li, William Waites and I have been meeting daily in James Clerk Maxwell’s childhood home in Edinburgh.

We’re hard at work on our project called New Mathematics and Software for Agent-Based models. It’s impossible to explain everything we’re doing while it’s happening. But I want to record some of our work. So please pardon how I skim through a bunch of already known ideas in my desperate attempt to quickly reach the main point. I’ll try to make up for this by giving lots of references.

Today I’ll talk about an interesting class of models we have developed together with Sean Wu. We call them ‘stochastic C-set rewriting systems’. They’re just part of our overall framework, but they’re an important part.

In this sort of model time is continuous, the state of the world is described by discrete data, and the state changes stochastically at discrete moments in time. All those features are already present in the class of models I described in Part 7. But today’s models are far more general, because the state of the world is described in a more general way! Now the state of the world at any moment of time is a C-set: a functor

f \colon C \to \mathsf{Set}

for some fixed finitely presented category C to the category of sets.

C-sets are a flexible generalization of directed graphs. For example, a thing like this is a C-set for an appropriate choice of C:

There are also C-sets that look even less like graphs.

C-sets have been implemented in AlgebraicJulia, a software framework for doing scientific computation with categories. To learn more, start here:

• Evan Patterson, Graphs and C-sets I: What is a graph?, AlgebraicJulia Blog, 1 September 2020.

There’s a lot more on this blog explaining things you can do with C-sets, and how they’re implemented in AlgebraicJulia. We plan to take advantage of all this stuff!

In particular, we’ll use ‘double pushout rewriting’ to specify rules for how a C-set can change with time. If you’re not familiar with this concept, start here:

• nLab, Double pushout rewriting.

This concept is well-understood (by those who understand it well), so I’ll just roughly sketch it. In double pushout rewriting for C-sets, a rewrite rule is a diagram of C-sets

L \stackrel{\ell}{\hookleftarrow} I \stackrel{r}{\to} R

To apply this rewrite rule to a C-set S, we find inside that C-set an instance of the pattern L, called a ‘match’, and replace it with the pattern R. These ‘patterns’ are themselves C-sets. The C-set I can be thought of as the common part of L and I. The maps \ell and r tell us how this common part fits into L and R.

Note that in this incredibly sketchy explanation I am already starting to use maps between C-sets! Indeed, for each category C there’s a category called C\mathsf{Set} with:

• functors f \colon C \to \mathsf{Set} as objects: we call these C-sets;

• natural transformations between such functors as morphisms: we call these C-set maps.

This sort of category has been intensively studied for many decades, and there’s a huge amount we can do with them:

• nLab, Category of presheaves.

I used C-set maps in a couple of places above. First, the arrows here

L \stackrel{\ell}{\hookleftarrow} I \stackrel{r}{\to} R

are C-set maps. For slightly technical reasons we demand that \ell be monic: that’s why I drew it with a hooked arrow. Second, I introduced the term ‘match’ without defining it. But we can define it: a match of L to a C-set S is simply a C-set map

L \to S

And now for some good news: Kris Brown has already implemented double pushout rewriting for C-sets in AlgebraicJulia:

• Github, AlgebraicRewriting.jl.

Stochastic C-set rewriting systems

Now comes the main idea I want to explain.

A stochastic C-set rewriting system consists of:

1) a category C

2) a finite collection of rewrite rules

\rho_i = \left( L_i \stackrel{\ell_i}{\hookleftarrow} I_i \stackrel{r_i}{\to} R_i \right)

3) for each rewrite rule \rho_i in our collection, a timer T_i. This is a stochastic map

T_i \colon [0,\infty) \to [0,\infty]

That’s all.

What does this do for us? First, it means that for each choice of rewrite rule \rho_i in our collection, and for each so-called start time t \ge 0, we get a probability measure T_i(t) on [0,\infty].

Let’s write w_i(t) to mean a randomly chosen element of [0,\infty] distributed according to the probability measure T_i(t). We call w_i(t) the wait time, because it says how long after time t we should wait until we apply the rewrite rule \rho_i. The time

t + w_i(t)

is called the rewrite time.

In what follows, I’ll always assume these randomly chosen numbers w_i(t) are stochastically independent—even if we reuse the same timer repeatedly for different tasks.

Running a stochastic C-set rewriting system

Okay, so how do we actually use this for modeling? How do we ‘run’ a context-independent stochastic C-set rewriting system? I’ll sketch it out.

The idea is that at any time t \ge 0, the state of the world is some C-set, say S_t. If you give me the initial state of the world S_0, the stochastic C-set rewriting system will tell you how to compute the state of the world at all later times. But this computation involves randomness.

Here’s how it works:

We start at t = 0. We look for all matches to patterns L_i in the initial state S_0. For each match we compute a wait time w_i(t) \in [0,\infty] and then the rewrite time t + w_i(t), but right now t = 0. We make a table of all the matches and their rewrite times.

The smallest of the rewrite times in our table, say 0 + w_j(0), is the first time the state of the world can change. We change it by applying the rewrite rule \rho_j to the state of the world S_0. When we do this, we cross off the rewrite time 0 + w_j(0) and its corresponding match from our table.

More generally, suppose t is any time when the state of the world changes. It will have changed by applying some rewrite rule \rho_j to the previous state of the world, giving some new C-set S_t.

When this happens, new matches can appear, and existing matches can disappear. So we do this:

1) For each existing match that disappears, we cross off that match and its rewrite time from our table.

2) For each new match that appears, say one involving the rewrite rule \rho_i, we add that match and its rewrite time t + w_i(t) to our table.

We then wait until the smallest rewrite time in our table, say t'. At that time, we apply the corresponding rewrite rule to the state S_t, getting some new C-set S_{t'}. We also cross off the rewrite time t' and its corresponding match from our table.

Then just keep doing the loop.

Subtleties

A lot of the subtleties in this formalism involve our use of timers.

For example, I computed wait times w_i(t) using a timer which is a stochastic map

T_i \colon [0,\infty) \to [0,\infty]

The dependence on t \in [0,\infty) here means the wait time can depend on when we start the timer. And the fact that this stochastic map takes values in [0,\infty] means the wait time can be infinite. This is a way of letting rewrite rules have a probability < 1 of ever being applied. If you don't like these features you can easily limit the formalism to avoid them.

The more serious subtleties involve whether and how to change wait times as the state of the world changes. For example, we can imagine more general timers that explicitly depend on the current state of the world as well as the time t \in [0,\infty). However, in this case I am confused about how we should update our table of wait times as the state of the world changes. So I decided to postpone discussing this generalization!

John PreskillLet gravity do its work

One day, early this spring, I found myself in a hotel elevator with three other people. The cohort consisted of two theoretical physicists, one computer scientist, and what appeared to be a normal person. I pressed the elevator’s 4 button, as my husband (the computer scientist) and I were staying on the hotel’s fourth floor. The button refused to light up.

“That happened last time,” the normal person remarked. He was staying on the fourth floor, too.

The other theoretical physicist pressed the 3 button.

“Should we press the 5 button,” the normal person continued, “and let gravity do its work?

I took a moment to realize that he was suggesting we ascend to the fifth floor and then induce the elevator to fall under gravity’s influence to the fourth. We were reaching floor three, so I exchanged a “have a good evening” with the other physicist, who left. The door shut, and we began to ascend.

As it happens,” I remarked, “he’s an expert on gravity.” The other physicist was Herman Verlinde, a professor at Princeton.

Such is a side effect of visiting the Simons Center for Geometry and Physics. The Simons Center graces the Stony Brook University campus, which was awash in daffodils and magnolia blossoms last month. The Simons Center derives its name from hedge-fund manager Jim Simons (who passed away during the writing of this article). He achieved landmark physics and math research before earning his fortune on Wall Street as a quant. Simons supported his early loves by funding the Simons Center and other scientific initiatives. The center reminded me of the Perimeter Institute for Theoretical Physics, down to the café’s linen napkins, so I felt at home.

I was participating in the Simons Center workshop “Entanglement, thermalization, and holography.” It united researchers from quantum information and computation, black-hole physics and string theory, quantum thermodynamics and many-body physics, and nuclear physics. We were to share our fields’ approaches to problems centered on thermalization, entanglement, quantum simulation, and the like. I presented about the eigenstate thermalization hypothesis, which elucidates how many-particle quantum systems thermalize. The hypothesis fails, I argued, if a system’s dynamics conserve quantities (analogous to energy and particle number) that can’t be measured simultaneously. Herman Verlinde discussed the ER=EPR conjecture.

My PhD advisor, John Preskill, blogged about ER=EPR almost exactly eleven years ago. Read his blog post for a detailed introduction. Briefly, ER=EPR posits an equivalence between wormholes and entanglement. 

The ER stands for Einstein–Rosen, as in Einstein–Rosen bridge. Sean Carroll provided the punchiest explanation I’ve heard of Einstein–Rosen bridges. He served as the scientific advisor for the 2011 film Thor. Sean suggested that the film feature a wormhole, a connection between two black holes. The filmmakers replied that wormholes were passé. So Sean suggested that the film feature an Einstein–Rosen bridge. “What’s an Einstein–Rosen bridge?” the filmmakers asked. “A wormhole.” So Thor features an Einstein–Rosen bridge.

EPR stands for Einstein–Podolsky–Rosen. The three authors published a quantum paradox in 1935. Their EPR paper galvanized the community’s understanding of entanglement.

ER=EPR is a conjecture that entanglement is closely related to wormholes. As Herman said during his talk, “You probably need entanglement to realize a wormhole.” Or any two maximally entangled particles are connected by a wormhole. The idea crystallized in a paper by Juan Maldacena and Lenny Susskind. They drew on work by Mark Van Raamsdonk (who masterminded the workshop behind this Quantum Frontiers post) and Brian Swingle (who’s appeared in further posts).

Herman presented four pieces of evidence for the conjecture, as you can hear in the video of his talk. One piece emerges from the AdS/CFT duality, a parallel between certain space-times (called anti–de Sitter, or AdS, spaces) and quantum theories that have a certain symmetry (called conformal field theories, or CFTs). A CFT, being quantum, can contain entanglement. One entangled state is called the thermofield double. Suppose that a quantum system is in a thermofield double and you discard half the system. The remaining half looks thermal—we can attribute a temperature to it. Evidence indicates that, if a CFT has a temperature, then it parallels an AdS space that contains a black hole. So entanglement appears connected to black holes via thermality and temperature.

Despite the evidence—and despite the eleven years since John’s publication of his blog post—ER=EPR remains a conjecture. Herman remarked, “It’s more like a slogan than anything else.” His talk’s abstract contains more hedging than a suburban yard. I appreciated the conscientiousness, a college acquaintance having once observed that I spoke carefully even over sandwiches with a friend.

A “source of uneasiness” about ER=EPR, to Herman, is measurability. We can’t check whether a quantum state is entangled via any single measurement. We have to prepare many identical copies of the state, measure the copies, and process the outcome statistics. In contrast, we seem able to conclude that a space-time is connected without measuring multiple copies of the space-time. We can check that a hotel’s first floor is connected to its fourth, for instance, by riding in an elevator once.

Or by riding an elevator to the fifth floor and descending by one story. My husband, the normal person, and I took the stairs instead of falling. The hotel fixed the elevator within a day or two, but who knows when we’ll fix on the truth value of ER=EPR?

With thanks to the conference organizers for their invitation, to the Simons Center for its hospitality, to Jim Simons for his generosity, and to the normal person for inspiration.

May 12, 2024

Scott Aaronson Jim Simons (1938-2024)

When I learned of Jim Simons’s passing, I was actually at the Simons Foundation headquarters in lower Manhattan, for the annual board meeting of the unparalleled Quanta Magazine, which Simons founded and named. The meeting was interrupted to share the sad news, before it became public … and then it was continued, because that’s obviously what Simons would’ve wanted. An oil portrait of Simons in the conference room took on new meaning.

See here for the Simons Foundation’s announcement, or here for the NYT’s obituary.

Although the Simons Foundation has had multiple significant influences on my life—funding my research, founding the Simons Institute for Theory of Computing in Berkeley that I often visit (including two weeks ago), and much more—I’ve exchanged all of a few sentences with Jim Simons himself. At a previous Simons Foundation meeting, I think he said he’d heard I’d moved from MIT to UT Austin, and asked whether I’d bought a cowboy hat yet. I said I did but I hadn’t yet worn it non-ironically, and he laughed at that. (My wife Dana knew him better, having spent a day at a brainstorming meeting for what became the Simons Institute, his trademark cigar smoke filling the room.)

I am, of course, in awe of what Jim Simons achieved in all three phases of his career — firstly, in mathematical research, where he introduced the Chern-Simons form and other pioneering contributions and led the math department at Stony Brook; secondly, in founding Renaissance and making insane amounts of money (“disproving the Efficient Market Hypothesis,” as some have claimed); and thirdly, in giving his money away to support basic research and the public understanding of it.

I’m glad that Simons, as a lifelong chain smoker, made it all the way to age 86. And I’m glad that the Simons Foundation, which I’m told will continue in perpetuity with no operational changes, will stand as a testament to his vision for the world.

May 11, 2024

John PreskillTo thermalize, or not to thermalize, that is the question.

The Noncommuting-Charges World Tour (Part 3 of 4)

This is the third part of a four-part series covering the recent Perspective on noncommuting charges. I’ll post one part every ~6 weeks leading up to my PhD thesis defence. You can find Part 1 here and Part 2 here.

If Hamlet had been a system of noncommuting charges, his famous soliloquy may have gone like this…

To thermalize, or not to thermalize, that is the question:
Whether ’tis more natural for the system to suffer
The large entanglement of thermalizing dynamics,
Or to take arms against the ETH
And by opposing inhibit it. To die—to thermalize,
No more; and by thermalization to say we end
The dynamical symmetries and quantum scars
That complicate dynamics: ’tis a consummation
Devoutly to be wish’d. To die, to thermalize;
To thermalize, perchance to compute—ay, there’s the rub:
For in that thermalization our quantum information decoheres,
When our coherence has shuffled off this quantum coil,
Must give us pause—there’s the respect
That makes calamity of resisting thermalization.

Hamlet (the quantum steampunk edition)


In the original play, Hamlet grapples with the dilemma of whether to live or die. Noncommuting charges have a dilemma regarding whether they facilitate or impede thermalization. Among the five research opportunities highlighted in the Perspective article, resolving this debate is my favourite opportunity due to its potential implications for quantum technologies. A primary obstacle in developing scalable quantum computers is mitigating decoherence; here, thermalization plays a crucial role. If systems with noncommuting charges are shown to resist thermalization, they may contribute to quantum technologies that are more resistant to decoherence. Systems with noncommuting charges, such as spin systems and squeezed states of light, naturally occur in quantum computing models like quantum dots and optical approaches. This possibility is further supported by recent advances demonstrating that non-Abelian symmetric operations are universal for quantum computing (see references 1 and 2).

In this penultimate blog post of the series, I will review some results that argue both in favour of and against noncommuting charges hindering thermalization. This discussion includes content from Sections III, IV, and V of the Perspective article, along with a dash of some related works at the end—one I recently posted and another I recently found. The results I will review do not directly contradict one another because they arise from different setups. My final blog post will delve into the remaining parts of the Perspective article.

Playing Hamlet is like jury duty for actors–sooner or later, you’re getting the call (source).

Arguments for hindering thermalization

The first argument supporting the idea that noncommuting charges hinder thermalization is that they can reduce the production of thermodynamic entropy. In their study, Manzano, Parrondo, and Landi explore a collisional model involving two systems, each composed of numerous subsystems. In each “collision,” one subsystem from each system is randomly selected to “collide.” These subsystems undergo a unitary evolution during the collision and are subsequently returned to their original systems. The researchers derive a formula for the entropy production per collision within a certain regime (the linear-response regime). Notably, one term of this formula is negative if and only if the charges do not commute. Since thermodynamic entropy production is a hallmark of thermalization, this finding implies that systems with noncommuting charges may thermalize more slowly. Two other extensions support this result.

The second argument stems from an essential result in quantum computing. This result is that every algorithm you want to run on your quantum computer can be broken down into gates you run on one or two qubits (the building blocks of quantum computers). Marvian’s research reveals that this principle fails when dealing with charge-conserving unitaries. For instance, consider the charge as energy. Marvian’s results suggest that energy-preserving interactions between neighbouring qubits don’t suffice to construct all energy-preserving interactions across all qubits. The restrictions become more severe when dealing with noncommuting charges. Local interactions that preserve noncommuting charges impose stricter constraints on the system’s overall dynamics compared to commuting charges. These constraints could potentially reduce chaos, something that tends to lead to thermalization.

Adding to the evidence, we revisit the eigenstate thermalization hypothesis (ETH), which I discussed in my first post. The ETH essentially asserts that if an observable and Hamiltonian adhere to the ETH, the observable will thermalize. This means its expectation value stabilizes over time, aligning with the expectation value of the thermal state, albeit with some important corrections. Noncommuting charges cause all kinds of problems for the ETH, as detailed in these two posts by Nicole Yunger Halpern. Rather than reiterating Nicole’s succinct explanations, I’ll present the main takeaway: noncommuting charges undermine the ETH. This has led to the development of a non-Abelian version of the ETH by Murthy and collaborators. This new framework still predicts thermalization in many, but not all, cases. Under a reasonable physical assumption, the previously mentioned corrections to the ETH may be more substantial.

If this story ended here, I would have needed to reference a different Shakespearean work. Fortunately, the internal conflict inherent in noncommuting aligns well with Hamlet. Noncommuting charges appear to impede thermalization in various aspects, yet paradoxically, they also seem to promote it in others.

Arguments for promoting thermalization

Among the many factors accompanying the thermalization of quantum systems, entanglement is one of the most studied. Last year, I wrote a blog post explaining how my collaborators and I constructed analogous models that differ in whether their charges commute. One of the paper’s results was that the model with noncommuting charges had higher average entanglement entropy. As a result of that blog post, I was invited to CBC’s “Quirks & Quarks” Podcast to explain, on national radio, whether quantum entanglement can explain the extreme similarities we see in identical twins who are raised apart. Spoilers for the interview: it can’t, but wouldn’t it be grand if it could?

Following up on that work, my collaborators and I introduced noncommuting charges into monitored quantum circuits (MQCs)—quantum circuits with mid-circuit measurements. MQCs offer a practical framework for exploring how, for example, entanglement is affected by the interplay between unitary dynamics and measurements. MQCs with no charges or with commuting charges have a weakly entangled phase (“area-law” phase) when the measurements are done often enough, and a highly entangled phase (“volume-law” phase) otherwise. However, in MQCs with noncommuting charges, this weakly entangled phase never exists. In its place, there is a critical phase marked by long-range entanglement. This finding supports our earlier observation that noncommuting charges tend to increase entanglement.

I recently looked at a different angle to this thermalization puzzle. It’s well known that most quantum many-body systems thermalize; some don’t. In those that don’t, what effect do noncommuting charges have? One paper that answers this question is covered in the Perspective. Here, Potter and Vasseur study many-body localization (MBL). Imagine a chain of spins that are strongly interacting. We can add a disorder term, such as an external field whose magnitude varies across sites on this chain. If the disorder is sufficiently strong, the system “localizes.” This implies that if we measured the expectation value of some property of each qubit at some time, it would maintain that same value for a while. MBL is one type of behaviour that resists thermalization. Potter and Vasseur found that noncommuting charges destabilize MBL, thereby promoting thermalizing behaviour.

In addition to the papers discussed in our Perspective article, I want to highlight two other studies that study how systems can avoid thermalization. One mechanism is through the presence of “dynamical symmetries” (there are “spectrum-generating algebras” with a locality constraint). These are operators that act similarly to ladder operators for the Hamiltonian. For any observable that overlaps with these dynamical symmetries, the observable’s expectation value will continue to evolve over time and will not thermalize in accordance with the Eigenstate Thermalization Hypothesis (ETH). In my recent work, I demonstrate that noncommuting charges remove the non-thermalizing dynamics that emerge from dynamical symmetries.

Additionally, I came across a study by O’Dea, Burnell, Chandran, and Khemani, which proposes a method for constructing Hamiltonians that exhibit quantum scars. Quantum scars are unique eigenstates of the Hamiltonian that do not thermalize despite being surrounded by a spectrum of other eigenstates that do thermalize. Their approach involves creating a Hamiltonian with noncommuting charges and subsequently breaking the non-Abelian symmetry. When the symmetry is broken, quantum scars appear; however, if the non-Abelian symmetry were to be restored, the quantum scars vanish. These last three results suggest that noncommuting charges impede various types of non-thermalizing dynamics.

Unlike Hamlet, the narrative of noncommuting charges is still unfolding. I wish I could conclude with a dramatic finale akin to the duel between Hamlet and Laertes, Claudius’s poisoning, and the proclamation of a new heir to the Danish throne. However, that chapter is yet to be written. “To thermalize or not to thermalize?” We will just have to wait and see.

Terence TaoTwo announcements: AI for Math resources, and erdosproblems.com

This post contains two unrelated announcements. Firstly, I would like to promote a useful list of resources for AI in Mathematics, that was initiated by Talia Ringer (with the crowdsourced assistance of many others) during the National Academies workshop on “AI in mathematical reasoning” last year. This list is now accepting new contributions, updates, or corrections; please feel free to submit them directly to the list (which I am helping Talia to edit). Incidentally, next week there will be a second followup webinar to the aforementioned workshop, building on the topics covered there. (The first webinar may be found here.)

Secondly, I would like to advertise the erdosproblems.com website, launched recently by Thomas Bloom. This is intended to be a living repository of the many mathematical problems proposed in various venues by Paul Erdős, who was particularly noted for his influential posing of such problems. For a tour of the site and an explanation of its purpose, I can recommend Thomas’s recent talk on this topic at a conference last week in honor of Timothy Gowers.

Thomas is currently issuing a call for help to develop the erdosproblems.com website in a number of ways (quoting directly from that page):

  • You know Github and could set a suitable project up to allow people to contribute new problems (and corrections to old ones) to the database, and could help me maintain the Github project;
  • You know things about web design and have suggestions for how this website could look or perform better;
  • You know things about Python/Flask/HTML/SQL/whatever and want to help me code cool new features on the website;
  • You know about accessibility and have an idea how I can make this website more accessible (to any group of people);
  • You are a mathematician who has thought about some of the problems here and wants to write an expanded commentary for one of them, with lots of references, comparisons to other problems, and other miscellaneous insights (mathematician here is interpreted broadly, in that if you have thought about the problems on this site and are willing to write such a commentary you qualify);
  • You knew Erdős and have any memories or personal correspondence concerning a particular problem;
  • You have solved an Erdős problem and I’ll update the website accordingly (and apologies if you solved this problem some time ago);
  • You have spotted a mistake, typo, or duplicate problem, or anything else that has confused you and I’ll correct things;
  • You are a human being with an internet connection and want to volunteer a particular Erdős paper or problem list to go through and add new problems from (please let me know before you start, to avoid duplicate efforts);
  • You have any other ideas or suggestions – there are probably lots of things I haven’t thought of, both in ways this site can be made better, and also what else could be done from this project. Please get in touch with any ideas!

I for instance contributed a problem to the site (#587) that Erdős himself gave to me personally (this was the topic of a somewhat well known photo of Paul and myself, and which he communicated again to be shortly afterwards on a postcard; links to both images can be found by following the above link). As it turns out, this particular problem was essentially solved in 2010 by Nguyen and Vu.

(Incidentally, I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my slides.)

Scott Aaronson UmeshFest

Unrelated Announcements: See here for a long interview with me in The Texas Orator, covering the usual stuff (quantum computing, complexity theory, AI safety). And see here for a podcast with me and Spencer Greenberg about a similar mix of topics.


A couple weeks ago, I helped organize UmeshFest: Don’t Miss This Flight, a workshop at UC Berkeley’s Simons Institute to celebrate the 26th birthday of my former PhD adviser Umesh Vazirani. Peter Shor, John Preskill, Manuel Blum, Madhu Sudan, Sanjeev Arora, and dozens of other luminaries of quantum and classical computation were on hand to help tell the story of quantum computing theory and Umesh’s central role in it. There was also constant roasting of Umesh—of his life lessons from the squash court, his last-minute organizational changes and phone calls at random hours. I was delighted to find that my old coinage of “Umeshisms” was simply standard usage among the attendees.


At Berkeley, many things were as I remembered them—my favorite Thai eatery, the bubble tea, the Campanile—but not everything was the same. Here I am in front of Berkeley’s Gaza encampment, a.k.a. its “Anti Zionism Zone” or what was formerly Sproul Plaza (zoom into the chalk):

I felt a need to walk through the Anti Zionism Zone day after day (albeit unassumingly, neither draped in an Israeli flag nor looking to start an argument with anyone), for more-or-less the same reasons why the US regularly sends aircraft carriers through the Strait of Taiwan.


Back in the more sheltered environment of the Simons Institute, it was great to be among friends, some of whom I hadn’t seen since before Covid. Andris Ambainis and I worked together for a bit on an open problem in quantum query complexity, for old times’ sake (we haven’t solved it yet).

And then there were talks! I thought I’d share my own talk, which was entitled The Story of BQP (Bounded-Error Quantum Polynomial-Time). Here are the PowerPoint slides, but I’ll also share screen-grabs for those of you who constantly complain that you can’t open PPTX files.

I was particularly proud of the design of my title slide:

Moving on:

The class BQP/qpoly, I should explain, is all about an advisor who’s all-wise and perfectly benevolent, but who doesn’t have a lot of time to meet with his students, so he simply doles out the same generic advice to all of them, regardless of their thesis problem x.

I then displayed my infamous “Umeshisms” blog post from 2005—one of the first posts in the history of this blog:

As I explained, now that I hang out with the rationalist and AI safety communities, which are also headquartered in Berkeley, I’ve learned that my “Umeshisms” post somehow took on a life of its own. Once, when dining at one of the rationalists’ polyamorous Berkeley group houses, I said this has been lovely but I’ll now need to leave, to visit my PhD former adviser Umesh Vazirani. “You mean the Umesh?!” the rationalists excitedly exclaimed. “Of Umeshisms? If you’ve never missed a flight?”

But moving on:

(Note that by “QBPP,” Bethiaume and Brassard meant what we now call BQP.)

Feynman and Deutsch asked exactly the right question—does simulating quantum mechanics on a classical computer inherently produce an exponential slowdown, or not?—but they lacked most of the tools to start formally investigating the question. A factor-of-two quantum speedup for the XOR function could be dismissed as unimpressive, while a much greater quantum speedup for the “constant vs. balanced” problem could be dismissed as a win against only deterministic classical algorithms, rather than randomized algorithms. Deutsch-Jozsa may have been the first time that an apparent quantum speedup faltered in an honest comparison against classical algorithms. It certainly wasn’t the last!

Ah, but this is where Bernstein and Vazirani enter the scene.

Bernstein and Vazirani didn’t merely define BQP, which remains the central object of study in quantum complexity theory. They also established its most basic properties:

And, at least in the black-box model, Bernstein and Vazirani gave the first impressive quantum speedup for a classical problem that survived in a fair comparison against the best classical algorithm:

The Recursive Bernstein-Vazirani problem, also called Recursive Fourier Sampling, is constructed as a “tree” of instances of the Bernstein-Vazirani problem, where to query the Boolean function at any given level, you need to solve a Bernstein-Vazirani problem for a Boolean function at the level below it, and then run the secret string s through a fixed Boolean function g. For more, see my old paper Quantum Lower Bound for Recursive Fourier Sampling.

Each Bernstein-Vazirani instance has classical query complexity n and quantum query complexity 1. So, if the tree of instances has depth d, then overall the classical query complexity is nd, while the quantum query complexity is only 2d. Where did the 2 come from? From the need to uncompute the secret strings s at each level, to enable quantum interference at the next level up—thereby forcing us to run the algorithm twice. A key insight.

The Recursive Fourier Sampling separation set the stage for Simon’s algorithm, which gave a more impressive speedup in the black-box model, and thence for the famous Shor’s algorithm for factoring and discrete log:

But Umesh wasn’t done establishing the most fundamental properties of BQP! There’s also the seminal 1994 paper by Bennett, Bernstein, Brassard, and Vazirani:

In light of the BV and BBBV papers, let’s see how BQP seems to fit with classical complexity classes—an understanding that’s remained largely stable for the past 30 years:

We can state a large fraction of the research agenda of the whole field, to this day, as questions about BQP:

I won’t have time to discuss all of these questions, but let me at least drill down on the first few.

Many people hoped the list of known problems in BQP would now be longer than it is. So it goes: we don’t decide the truth, we only discover it.

As a 17-year-old just learning about quantum computing in 1998 by reading the Bernstein-Vazirani paper, I was thrilled when I managed to improve their containment BQP ⊆ P#P to BQP ⊆ PP. I thought that would be my big debut in quantum complexity theory. I was then crushed when I learned that Adleman, DeMarrais, and Huang had proved the same thing a year prior. OK, but at least it wasn’t, like, 50 years prior! Maybe if I kept at it, I’d reach the frontier soon enough.

Umesh, from the very beginning, raised the profound question of BQP’s relation to the polynomial hierarchy. Could we at least construct an oracle relative to which BQP⊄PH—or, closely related, relative to which P=NP≠BQP? Recursive Fourier Sampling was a already candidate for such a separation. I spent months trying to prove that candidate wasn’t in PH, but failed. That led me eventually to propose a very different problem, Forrelation, which seemed like a stronger candidate, although I couldn’t prove that either. Finally, in 2018, after four years of effort, Ran Raz and Avishay Tal proved that my Forrelation problem was not in PH, thereby resolving Umesh’s question after a quarter century.

We now know three different ways by which a quantum computer can not merely solve any BQP problem efficiently, but prove its answer to a classical skeptic via an interactive protocol! Using quantum communication, using two entangled (but non-communicating) quantum computers, or using cryptography (this last a breakthrough of Umesh’s PhD student Urmila Mahadev). It remains a great open problem, first posed to my knowledge by Daniel Gottesman, whether one can do it with none of these things.

To see many of the advantages of quantum computation over classical, we’ve learned that we need to broaden our vision beyond BQP (which is a class of languages), to promise problems (like estimating the expectation values of observables), sampling problems (like BosonSampling and Random Circuit Sampling), and relational problems (like the Yamakawa-Zhandry problem, subject of a recent breakthrough). It’s conceivable that quantum advantage could remain for such problems even if it turned out that P=BQP.

A much broader question is whether BQP captures all languages that can be efficiently decided using “reasonable physical resources.” What about chiral quantum field theories, like the Standard Model of elementary particles? What about quantum theories of gravity? Good questions!

Since it was Passover during the talk, I literally said “Dayenu” to Umesh: “if you had only given us BQP, that would’ve been enough! but you didn’t, you gave us so much more!”

Happy birthday Umesh!! We look forward to celebrating again on all your subsequent power-of-2 birthdays.

May 10, 2024

Matt Strassler Significant Chance of Major Aurora Outbreak!

I don’t use exclamation marks in blog post titles lightly. For those of us hoping to see the northern and southern lights (auroras) outside their usual habitat near the Earth’s poles, this is one of those rare weekends where the odds are in our favor. NOAA’s Space Weather Prediction Center has issued a rare G4 forecast (out of a range from G1 to G5) for a major geomagnetic “storm”.

Though the large and active sunspot from earlier this week has moved on, it has been followed by an even larger group of sunspots, so enormous that you can easily see them with eclipse glasses if you’ve kept your pair from last month.

A monster sunspot group on the Sun right now (May 9, 2024).

Powerful solar flares (explosions at the Sun’s visible surface) and the accompanying large coronal mass ejections (“CMEs”, huge clouds of subatomic particles that stream across space from the Sun toward the planets) keep coming, one after another; the second-largest of the week happened just a few hours ago. In the next 24-72 hours, the combined effects of these CMEs may drive the Earth’s magnetic field haywire, leading to northern and southern lights that are much stronger and much more equatorial than usual.

How far south might the northern lights reach? That’s hard to predict, unfortunately. But it wouldn’t be surprising if they reached midway across the United States, and across much of Europe.

If you decide to go looking, keep in mind that dark skies are so important; the auroras can seem quite bright in a dark sky, but they are easily lost to light pollution from city lights or even nearby street lights. Make sure to turn off your car headlights and let your eyes adjust to the dark for a few minutes. The auroras are typically to the north (in the northern hemisphere), but I’ve seen them directly overhead in a strong storm. They’re most often green, but other colors may appear, if you’re lucky. If you’re not sure whether you’re seeing them, take a photo; the camera can pick up dim light and its color more effectively than your eyes can.

As for when to go looking — auroras might happen at any time, from a few hours from now through the weekend, and would potentially be visible whenever the sky is dark. For more detailed information, there are two sources of data that I find useful to monitor:

  • First, at this site, you can find near-real-time data on the solar wind— the flow of particles from the Sun — from the ACE satellite, which orbits the Sun almost a million miles from Earth. If you see a sudden wildness in the data, that’s a good sign that a CME has probably passed this satellite, and will arrive at Earth in less than an hour.
  • Second, data on the strength of the geomagnetic storm can be found here — but be warned! It is provided only as an average over the past three hours, and only updated every three hours — and so it can be as much as three hours out of date. But if you see the “Kp index” in the red, up around 7 or above, something significant is happening. In a G4 storm, this index can reach 9.

Good luck!!

John BaezHexagonal Tiling Honeycomb



This picture by Roice Nelson shows a remarkable structure: the hexagonal tiling honeycomb.

What is it? Roughly speaking, a honeycomb is a way of filling 3d space with polyhedra. The most symmetrical honeycombs are the ‘regular’ ones. For any honeycomb, we define a flag to be a chosen vertex lying on a chosen edge lying on a chosen face lying on a chosen polyhedron. A honeycomb is regular if its geometrical symmetries act transitively on flags.

The most familiar regular honeycomb is the usual way of filling Euclidean space with cubes. This cubic honeycomb is denoted by the symbol {4,3,4}, because a square has 4 edges, 3 squares meet at each corner of a cube, and 4 cubes meet along each edge of this honeycomb. We can also define regular honeycombs in hyperbolic space. For example, the order-5 cubic honeycomb is a hyperbolic honeycomb denoted {4,3,5}, since 5 cubes meet along each edge:



Coxeter showed there are 15 regular hyperbolic honeycombs. The hexagonal tiling honeycomb is one of these. But it does not contain polyhedra of the usual sort! Instead, it contains flat Euclidean planes embedded in hyperbolic space, each plane containing the vertices of infinitely many regular hexagons. You can think of such a sheet of hexagons as a generalized polyhedron with infinitely many faces. You can see a bunch of such sheets in the picture:



The symbol for the hexagonal tiling honeycomb is {6,3,3}, because a hexagon has 6 edges, 3 hexagons meet at each corner in a plane tiled by regular hexagons, and 3 such planes meet along each edge of this honeycomb. You can see that too if you look carefully.

A flat Euclidean plane in hyperbolic space is called a horosphere. Here’s a picture of a horosphere tiled with regular hexagons, yet again drawn by Roice:



Unlike the previous pictures, which are views from inside hyperbolic space, this uses the Poincaré ball model of hyperbolic space. As you can see here, a horosphere is a limiting case of a sphere in hyperbolic space, where one point of the sphere has become a ‘point at infinity’.

Be careful. A horosphere is intrinsically flat, so if you draw regular hexagons on it their internal angles are

2\pi/3 = 120^\circ

as usual in Euclidean geometry. But a horosphere is not ‘totally geodesic’: straight lines in the horosphere are not geodesics in hyperbolic space! Thus, a hexagon in hyperbolic space with the same vertices as one of the hexagons in the horosphere actually bulges out from the horosphere a bit — and its internal angles are less than 2\pi/3: they are

\arccos\left(-\frac{1}{3}\right) \approx 109.47^\circ

This angle may be familar if you’ve studied tetrahedra. That’s because each vertex lies at the center of a regular tetrahedron, with its four nearest neighbors forming the tetrahedron’s corners.

It’s really these hexagons in hyperbolic space that are faces of the hexagonal tiling honeycomb, not those tiling the horospheres, though perhaps you can barely see the difference. This can be quite confusing until you think about a simpler example, like the difference between a cube in Euclidean 3-space and a cube drawn on a sphere in Euclidean space.

Connection to special relativity

There’s an interesting connection between hyperbolic space, special relativity, and 2×2 matrices. You see, in special relativity, Minkowski spacetime is \mathbb{R}^4 equipped with the nondegenerate bilinear form

(t,x,y,z) \cdot (t',x',y',z') = t t' - x x' - y y' - z z

usually called the Minkowski metric. Hyperbolic space sits inside Minowski spacetime as the hyperboloid of points \mathbf{x} = (t,x,y,z) with \mathbf{x} \cdot \mathbf{x} = 1 and t > 0. But we can also think of Minkowski spacetime as the space \mathfrak{h}_2(\mathbb{C}) of 2×2 hermitian matrices, using the fact that every such matrix is of the form

A =  \left( \begin{array}{cc} t + z & x - i y \\ x + i y & t - z \end{array} \right)

and

\det(A) =  t^2 - x^2 - y^2 - z^2

In these terms, the future cone in Minkowski spacetime is the cone of positive definite hermitian matrices:

\left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A > 0, \,  \mathrm{tr}(A) > 0 \right\}

Sitting inside this we have the hyperboloid

\mathcal{H} = \left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A = 1, \,  \mathrm{tr}(A) > 0 \right\}

which is none other than hyperbolic space!

Connection to the Eisenstein integers

Since the hexagonal tiling honeycomb lives inside hyperbolic space, which in turn lives inside Minkowski spacetime, we should be able to describe the hexagonal tiling honeycomb as sitting inside Minkowski spacetime. But how?

Back in 2022, James Dolan and I conjectured such a description, which takes advantage of the picture of Minkowski spacetime in terms of 2×2 matrices. And this April, working on Mathstodon, Greg Egan and I proved this conjecture!

I’ll just describe the basic idea here, and refer you elsewhere for details.

The Eisenstein integers \mathbb{E} are the complex numbers of the form

a + b \omega

where a and b are integers and \omega = \exp(2 \pi i/3) is a cube root of 1. The Eisenstein integers are closed under addition, subtraction and multiplication, and they form a lattice in the complex numbers:

Similarly, the set \mathfrak{h}_2(\mathbb{E}) of 2×2 hermitian matrices with Eisenstein integer entries gives a lattice in Minkowski spacetime, since we can describe Minkowski spacetime as \mathfrak{h}_2(\mathbb{C}).

Here’s the conjecture:

Conjecture. The points in the lattice \mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are the centers of hexagons in a hexagonal tiling honeycomb.

Using known results, it’s relatively easy to show that there’s a hexagonal tiling honeycomb whose hexagon centers are all points in \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}. The hard part is showing that every point in \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} is a hexagon center. Points in \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} are the same as 4-tuples of integers obeying an inequality (the \mathrm{tr}(A) > 0 condition) and a quadratic equation (the \det(A) = 1 condition). So, we’re trying to show that all 4-tuples obeying those constraints follow a very regular pattern.

Here are two proofs of the conjecture:

• John Baez, Line bundles on complex tori (part 5), The n-Category Café, April 30, 2024.

Greg Egan and I came up with the first proof. The basic idea was to assume there’s a point in \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} that’s not a hexagon center, choose one as close as possible to the identity matrix, and then construct an even closer one, getting a contradiction. Shortly thereafter, someone on Mastodon by the name of Mist came up with a second proof, similar in strategy but different in detail. This increased my confidence in the result.

What’s next?

Something very similar should be true for another regular hyperbolic honeycomb, the square tiling honeycomb:


Here instead of the Eisenstein integers we should use the Gaussian integers, \mathbb{G}, consisting of all complex numbers

a + b i

where a and b are integers.

Conjecture. The points in the lattice \mathfrak{h}_2(\mathbb{G}) that lie on the hyperboloid \mathcal{H} are the centers of squares in a square tiling honeycomb.

I’m also very interested in how these results connect to algebraic geometry! I explained this in some detail here:

Line bundles on complex tori (part 4), The n-Category Café, April 26, 2024.

Briefly, the hexagon centers in the hexagonal tiling honeycomb correspond to principal polarizations of the abelian variety \mathbb{C}^2/\mathbb{E}^2. These are concepts that algebraic geometers know and love. Similarly, if the conjecture above is true, the square centers in the square tiling honeycomb will correspond to principal polarizations of the abelian variety \mathbb{C}^2/\mathbb{G}^2. But I’m especially interested in interpreting the other features of these honeycombs — not just the hexagon and square centers — using ideas from algebraic geometry.

Matt von HippelGetting It Right vs Getting It Done

With all the hype around machine learning, I occasionally get asked if it could be used to make predictions for particle colliders, like the LHC.

Physicists do use machine learning these days, to be clear. There are tricks and heuristics, ways to quickly classify different particle collisions and speed up computation. But if you’re imagining something that replaces particle physics calculations entirely, or even replace the LHC itself, then you’re misunderstanding what particle physics calculations are for.

Why do physicists try to predict the results of particle collisions? Why not just observe what happens?

Physicists make predictions not in order to know what will happen in advance, but to compare those predictions to experimental results. If the predictions match the experiments, that supports existing theories like the Standard Model. If they don’t, then a new theory might be needed.

Those predictions certainly don’t need to be made by humans: most of the calculations are done by computers anyway. And they don’t need to be perfectly accurate: in particle physics, every calculation is an approximation. But the approximations used in particle physics are controlled approximations. Physicists keep track of what assumptions they make, and how they might go wrong. That’s not something you can typically do in machine learning, where you might train a neural network with millions of parameters. The whole point is to be able to check experiments against a known theory, and we can’t do that if we don’t know whether our calculation actually respects the theory.

That difference, between caring about the result and caring about how you got there, is a useful guide. If you want to predict how a protein folds in order to understand what it does in a cell, then you will find AlphaFold useful. If you want to confirm your theory of how protein folding happens, it will be less useful.

Some industries just want the final result, and can benefit from machine learning. If you want to know what your customers will buy, or which suppliers are cheating you, or whether your warehouse is moldy, then machine learning can be really helpful.

Other industries are trying, like particle physicists, to confirm that a theory is true. If you’re running a clinical trial, you want to be crystal clear about how the trial data turn into statistics. You, and the regulators, care about how you got there, not just about what answer you got. The same can be true for banks: if laws tell you you aren’t allowed to discriminate against certain kinds of customers for loans, you need to use a method where you know what traits you’re actually discriminating against.

So will physicists use machine learning? Yes, and more of it over time. But will they use it to replace normal calculations, or replace the LHC? No, that would be missing the point.

May 09, 2024

Matt Strassler Increased Chance of Northern/Southern Lights

A couple of days ago, I noted a chance of auroras (a.k.a. northern and southern lights) this week. That chance just went up again, with a series of solar flares and coronal mass ejections. The chance of auroras being visible well away from their usual latitudes is pretty high in the 36-48 hour range… meaning the evening of May 10th into the morning of May 11th in both Europe (with the best chances) and in the US and Canada.

Keep in mind that timing and aurora strength are hard to predict, so no prediction is guaranteed; it could come to nothing, or the auroras could show up somewhat earlier and be stronger than expected.

Meanwhile, the SciComm 2 conference continues at the Perimeter Institute. As part of it, experimental particle physicist Clara Nellist gave a public talk to an enthusiastic audience last night, reviewing the LHC experiments and their achievements. You can find it on YouTube if you’d like to watch it.

May 08, 2024

Matt Strassler Speaking Next Wednesday in Northampton, Massachusetts

I’ll be spending the remainder of this week at the 2nd Scicomm Collider conference, hosted at the Perimeter Institute in Waterloo, Ontario, and organized by astrophysicist and writer Katie Mack. I’m very much looking forward to it!

Next week I’ll be back in Massachusetts, in the town of Northampton, where I’ll be speaking about my book. The event is at 7pm on Wednesday, May 15th at the lovely Broadside Bookshop. If you’re in the Pioneer Valley, please join me! And if you have friends in the area, please let them know.

Doug NatelsonWind-up nanotechnology

When I was a kid, I used to take allowance money and occasionally buy rubber-band-powered balsa wood airplanes at a local store.  Maybe you've seen these.  You wind up the rubber band, which stretches the elastomer and stores energy in the elastic strain of the polymer, as in Hooke's Law (though I suspect the rubber band goes well beyond the linear regime when it's really wound up, because of the higher order twisting that happens).  Rhett Alain wrote about how well you can store energy like this.  It turns out that the stored energy per mass of the rubber band can get pretty substantial. 

Carbon nanotubes are one of the most elastically strong materials out there.  A bit over a decade ago, a group at Michigan State did a serious theoretical analysis of how much energy you could store in a twisted yarn made from single-walled carbon nanotubes.  They found that the specific energy storage could get as large as several MJ/kg, as much as four times what you get with lithium ion batteries!

Now, a group in Japan has actually put this to the test, in this Nature Nano paper.  They get up to 2.1 MJ/kg, over the lithium ion battery mark, and the specific power (when they release the energy) at about \(10^{6}\) W/kg is not too far away from "non-cyclable" energy storage media, like TNT.  Very cool!  

May 06, 2024

Tommaso DorigoMove Over - The Talk I Will Not Give

Last week I was in Amsterdam, where I attended the first European AI for Fundamental Physics conference (EUCAIF). Unfortunately I could not properly follow the works there, as in the midst of it I got grounded by a very nasty bronchial bug. Then over the weekend I was able to drag myself back home, and today, still struggling with the after-effects, am traveling to Rome for another relevant event.

read more

May 04, 2024

n-Category Café Line Bundles on Complex Tori (Part 4)

Last time I introduced a 2-dimensional complex variety called the Eisenstein surface

E=/𝔼×/𝔼 E = \mathbb{C}/\mathbb{E} \times \mathbb{C}/\mathbb{E}

where 𝔼\mathbb{E} \subset \mathbb{C} is the lattice of Eisenstein integers. We worked out the Néron–Severi group NS(E)\mathrm{NS}(E) of this surface: that is, the group of equivalence classes of holomorphic line bundles on this surface, where we count two as equivalent if they’re isomorphic as topological line bundles. And we got a nice answer:

NS(E)𝔥 2(𝔼) \mathrm{NS}(E) \cong \mathfrak{h}_2(\mathbb{E})

where 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) consists of 2×22 \times 2 hermitian matrices with Eisenstein integers as entries.

Now we’ll see how this is related to the ‘hexagonal tiling honeycomb’:

We’ll see an explicit bijection between so-called ‘principal polarizations’ of the Eisenstein surface and the centers of hexagons in this picture! We won’t prove it works — I hope to do that later. But we’ll get everything set up.

The hexagonal tiling honeycomb

This picture by Roice Nelson shows a remarkable structure: the hexagonal tiling honeycomb.

What is it? Roughly speaking, a honeycomb is a way of filling 3d space with polyhedra. The most symmetrical honeycombs are the ‘regular’ ones. For any honeycomb, we define a flag to be a chosen vertex lying on a chosen edge lying on a chosen face lying on a chosen polyhedron. A honeycomb is regular if its geometrical symmetries act transitively on flags.

The most familiar regular honeycomb is the usual way of filling Euclidean space with cubes. This cubic honeycomb is denoted by the symbol {4,3,4}\{4,3,4\}, because a square has 4 edges, 3 squares meet at each corner of a cube, and 4 cubes meet along each edge of this honeycomb. We can also define regular honeycombs in hyperbolic space. For example, the order-5 cubic honeycomb is a hyperbolic honeycomb denoted {4,3,5}\{4,3,5\}, since 5 cubes meet along each edge:

Coxeter showed there are 15 regular hyperbolic honeycombs. The hexagonal tiling honeycomb is one of these. But it does not contain polyhedra of the usual sort! Instead, it contains flat Euclidean planes embedded in hyperbolic space, each plane containing the vertices of infinitely many regular hexagons. You can think of such a sheet of hexagons as a generalized polyhedron with infinitely many faces. You can see a bunch of such sheets in the picture:

The symbol for the hexagonal tiling honeycomb is {6,3,3}\{6,3,3\}, because a hexagon has 6 edges, 3 hexagons meet at each corner in a plane tiled by regular hexagons, and 3 such planes meet along each edge of this honeycomb. You can see that too if you look carefully.

A flat Euclidean plane in hyperbolic space is called a horosphere. Here’s a picture of a horosphere tiled with regular hexagons, yet again drawn by Roice:

Unlike the previous pictures, which are views from inside hyperbolic space, this uses the Poincaré ball model of hyperbolic space. As you can see here, a horosphere is a limiting case of a sphere in hyperbolic space, where one point of the sphere has become a ‘point at infinity’.

Be careful. A horosphere is intrinsically flat, so if you draw regular hexagons on it their internal angles are

2π/3=120 2\pi/3 = 120^\circ

as usual in Euclidean geometry. But a horosphere is not ‘totally geodesic’: straight lines in the horosphere are not geodesics in hyperbolic space! Thus, a hexagon in hyperbolic space with the same vertices as one of the hexagons in the horosphere actually bulges out from the horosphere a bit — and its internal angles are less than 2π/32\pi/3: they are

arccos(13)109.47 \arccos\left(-\frac{1}{3}\right) \approx 109.47^\circ

It’s really these hexagons in hyperbolic space that are faces of the hexagonal tiling honeycomb, not those tiling the horospheres, though perhaps you can barely see the difference. This can be quite confusing until you think about a simpler example, like the difference between a cube in Euclidean 3-space and a cube drawn on a sphere in Euclidean space.

Connection to special relativity

Johnson and Weiss have studied the symmetry group of the hexagonal tiling honeycomb:

They describe this group using the ring of Eisenstein integers:

𝔼={a+bω|a,b} \mathbb{E} = \{ a + b \omega \; \vert \; a, b \in \mathbb{Z} \} \subset \mathbb{C}

where ω\omega is the cube root of unity exp(2πi/3)\exp(2 \pi i/ 3). And I believe their work implies this result:

Theorem. The orientation-preserving symmetries of the hexagonal tiling honeycomb form the group PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

I’ll sketch a proof later, starting from what they actually show.

For comparison, the group of all orientation-preserving symmetries of hyperbolic space forms the larger group PGL(2,)\mathrm{PGL}(2,\mathbb{C}). This group is the same as PSL(2,)\mathrm{PSL}(2,\mathbb{C})… and this naturally brings Minkowski spacetime into the picture!

You see, in special relativity, Minkowski spacetime is 4\mathbb{R}^4 equipped with the nondegenerate bilinear form

(t,x,y,z)(t,x,y,z)=ttxxyyzz (t,x,y,z) \cdot (t',x',y',z') = t t' - x x' - y y' - z z

usually called the Minkowski metric.

Hyperbolic space sits inside Minowski spacetime as the hyperboloid of points x=(t,x,y,z)\mathbf{x} = (t,x,y,z) with xx=1\mathbf{x} \cdot \mathbf{x} = 1 and t>0t &gt; 0. Equivalently, we can think of Minkowski spacetime as the space 𝔥 2()\mathfrak{h}_2(\mathbb{C}) of 2×22 \times 2 hermitian complex matrices, using the fact that every such matrix is of the form

A=(t+z xiy x+iy tz) A = \left( \begin{array}{cc} t + z & x - i y \\ x + i y & t - z \end{array} \right)

and

det(A)=t 2x 2y 2z 2 \det(A) = t^2 - x^2 - y^2 - z^2

One reason this viewpoint is productive is that the group of symmetries of Minkowski spacetime that preserve the orientation and also preserve the distinction between future and past is the projective special linear group PSL(2,)\mathrm{PSL}(2,\mathbb{C}). The idea here is that any element gSL(2,)g \in \mathrm{SL}(2,\mathbb{C}) acts on 𝔥 2()\mathfrak{h}_2(\mathbb{C}) by

AgAg * A \mapsto g A g^\ast

This action clearly preserves the Minkowski metric (since it preserves the determinant of AA) and also the orientation and the direction of time (because SL(2,)\mathrm{SL}(2,\mathbb{C}) is connected). However, multiples of the identity matrix, namely the matrices ±I\pm I, act trivially. So, we get an action of the quotient group PSL(2,)\mathrm{PSL}(2,\mathbb{C}).

In these terms, the future cone in Minkowski spacetime is the cone of positive definite hermitian matrices:

𝒦={A𝔥 2()|detA>0,tr(A)>0} \mathcal{K} = \left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A &gt; 0, \, \mathrm{tr}(A) &gt; 0 \right\}

Sitting inside this we have the hyperboloid

={A𝔥 2()|detA=1,tr(A)>0} \mathcal{H} = \left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A = 1, \, \mathrm{tr}(A) &gt; 0 \right\}

which is none other than hyperbolic space! The Minkowski metric on 𝔥 2()\mathfrak{h}_2(\mathbb{C}) induces the usual Riemannian metric on hyperbolic space (up to a change of sign).

Indeed, not only is the symmetry group of the hexagonal tiling honeycomb abstractly isomorphic to the subgroup

PGL(2,𝔼)PGL(2,)=PSL(2,) \mathrm{PGL}(2,\mathbb{E}) \subset \mathrm{PGL}(2,\mathbb{C}) = \mathrm{PSL}(2,\mathbb{C})

we’ve also seen this subgroup acts as orientation-preserving isometries of hyperbolic space. So it seems almost obvious that PGL(2,𝔼) \mathrm{PGL}(2,\mathbb{E}) acts on hyperbolic space so as to preserve some hexagonal tiling honeycomb!

Constructing the hexagonal tiling honeycomb

Thus, the big question is: how can we actually construct a hexagonal tiling honeycomb inside \mathcal{H} that is preserved by the action of PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E})? I want to answer this question.

Sitting in the complex numbers we have the ring 𝔼\mathbb{E} of Eisenstein integers. This lets us define a lattice in Minkowski spacetime, called 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}), consisting of 2×22 \times 2 hermitian matrices with entries that are Eisenstein integers. James Dolan and I conjectured this:

Conjecture. The points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are the centers of hexagons in a hexagonal tiling honeycomb.

I think Greg Egan has done enough work to make this clear. I will try to write up proof here next time. Once this is done, it should be fairly easy to construct the other features of the hexagonal tiling honeycomb. They should all be related in various ways to the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}).

Connection to the Néron–Severi group of the Eisenstein surface

But what does any of this have to do with the supposed theme of these blog posts: line bundles on complex tori? To answer this, we need to remember that last time I gave an explicit isomorphism between

  • the Néron–Severi group NS(E)\mathrm{NS}(E) of the Eisenstein surface E= 2/𝔼 2E = \mathbb{C}^2/\mathbb{E}^2

and

  • 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) (viewed as an additive group).

This isomorphism wasn’t new: experts know a lot about it. For example, under this correspondence, elements A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) with det(A)>0\det(A) &gt; 0 and tr(A)>0\mathrm{tr}(A) &gt; 0 correspond to elements of the Néron–Severi group coming from ample line bundles. Elements of the Néron–Severi group coming from ample line bundles are called polarizations.

Furthermore, elements with A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) with det(A)=1\det(A) = 1 and tr(A)>0\mathrm{tr}(A) &gt; 0 are known to correspond to certain specially nice polarizations called ‘principal’ polarizations.

So, the conjecture implies this:

Main Result. There is an explicit bijection between principal polarizations of the Eisenstein surface and centers of hexagons in the hexagonal tiling honeycomb.

There’s a lot more to say, but I’ll stop here for now, at least after filling in some details that I owe you.

Symmetries of the hexagonal tiling honeycomb

As shown already by Coxeter, the group of all isometries of hyperbolic space mapping the hexagonal tiling honeycomb to itself is the Coxeter group [6,3,3][6,3,3]. That means this group has a presentation with generators v,e,f,pv, e, f, p and relations

v 2=e 2=f 2=p 2=1 v^2 = e^2 = f^2 = p^2 = 1

(ve) 6=1,(ef) 3=1,(fp) 3=1,(vf) 2=1,(ep) 2=1 (v e)^6 = 1, \qquad (e f)^3 = 1, \qquad (f p)^3 = 1, \qquad (v f)^2 = 1, \qquad (e p)^2 = 1

Each generator corresponds to a reflection in hyperbolic space — that is, an orientation-reversing transformation that preserves angles and distances. The products of pairs of generators are orientation preserving, and they generate an index-2 subgroup of [6,3,3][6,3,3] called [6,3,3] +[6,3,3]^+.

Johnson and Weiss describe the group [6,3,3] +[6,3,3]^+ using the Eisenstein integers. Namely, they show it’s isomorphic to the group PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}).

But what the heck is that?

As usual, SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) is the group of 2×22 \times 2 matrices with entries in 𝔼\mathbb{E} having determinant 11. But Johnson and Weiss define a slightly larger group S¯L(2,𝔼)\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) to consist of all 2×22 \times 2 matrices with entries in 𝔼\mathbb{E} that have determinant with absolute value 11. This in turn has a subgroup S¯Z(2,𝔼)\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}) consisting of multiples of the identity λI\lambda I where λ𝔼\lambda \in \mathbb{E} has absolute value 11. Then they define PS¯L(2,𝔼)=S¯L(2,𝔼)/S¯Z(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) = \overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E})/\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}).

What does the group PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) actually amount to? Since all the units in 𝔼\mathbb{E} have absolute value 11 — they’re just the 6th roots of unity — S¯L(2,𝔼)\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) is the same as the group GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) consisting of all invertible 2×22 \times 2 matrices with entries in 𝔼\mathbb{E}. The subgroup S¯Z(2,𝔼)\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}) consists of matrices λI\lambda I where λ\lambda is a 6th root of unity. If I’m not confused, this is just the center of GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}). So what they’re calling PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) is GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) modulo its center. This is usually called PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

All this is a bit confusing, but I think that with massive help from Johnson and Weiss we’ve shown this:

Theorem. The orientation-preserving symmetries of the hexagonal tiling honeycomb form the group PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

The interplay between PSL(2,𝔼)\mathrm{PSL}(2,\mathbb{E}) and PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}) will become clearer next time: the latter group contains some 60 degree rotations that the former group does not!

n-Category Café Line Bundles on Complex Tori (Part 5)

The Eisenstein integers 𝔼\mathbb{E} are the complex numbers of the form a+bωa + b \omega where aa and bb are integers and ω=exp(2πi/3)\omega = \exp(2 \pi i/3). They form a subring of the complex numbers and also a lattice:

Last time I explained how the space 𝔥 2()\mathfrak{h}_2(\mathbb{C}) of 2×22 \times 2 hermitian matrices is secretly 4-dimensional Minkowski spacetime, while the subset

={Ah 2()|detA=1,tr(A)>0} \mathcal{H} = \left\{A \in \h_2(\mathbb{C}) \, \vert \, \det A = 1, \, \mathrm{tr}(A) \gt 0 \right\}

is 3-dimensional hyperbolic space. Thus, the set 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) of 2×22 \times 2 hermitian matrices with Eisenstein integer entries forms a lattice in Minkowski spacetime, and I conjectured that 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} consists exactly of the hexagon centers in the hexagonal tiling honeycomb — a highly symmetrical structure in hyperbolic space, discovered by Coxeter, which looks like this:

Now Greg Egan and I will prove that conjecture.

Last time, based on the work of Johnson and Weiss, we saw that the orientation-preserving symmetries of the hexagonal tiling honeycomb form the group PGL(2,𝔼)PGL(2,\mathbb{E}). This is not only as an abstract isomorphism of groups: it’s an isomorphism of groups acting on hyperbolic space, where GL(2,𝔼)GL(2,\mathbb{E}) acts on 𝔥 2()\mathfrak{h}_2(\mathbb{C}) and its subset \mathcal{H} by

g:AgAg * g \colon A \mapsto g A g^\ast

and this action descends to the quotient PGL(2,𝔼)PGL(2,\mathbb{E}).

By general abstract nonsense about Coxeter groups, the orientation-preserving symmetries of the hexagonal tiling honeycomb act transitively on the set of hexagon centers. Thus, if we choose a suitable point pp \in \mathcal{H}, we can get all the hexagon centers by acting on this one. Then the set of hexagon centers is this:

Hex={gpg *𝔥 2(𝔼)|gGL(2,𝔼)} \mathrm{Hex} = \{ g p g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \}

But what’s a suitable point pp? I claim that the identity matrix will do, so

Hex={gg *𝔥 2(𝔼)|gGL(2,𝔼)} \mathrm{Hex} = \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \}

Once we show this, we can study the set Hex\mathrm{Hex} in detail, allowing us to prove the result conjectured last time:

Theorem. 𝔥 2(𝔼)=Hex\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} = \mathrm{Hex}, so the points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are the centers of hexagons in a hexagonal tiling honeycomb.

But first things first: let’s see why the identity matrix can serve as a hexagon center!

The 12 hexagon centers closest to the identity

Suppose we take the identity as a hexagon center and get a bunch of points in hyperbolic space by acting on it by all possible transformations in GL(2,𝔾)\mathrm{GL}(2,\mathbb{G}). We get this set:

Hex={gg *𝔥 2(𝔼)|gGL(2,𝔼)} \mathrm{Hex} = \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \}

But how do we know this is right? We need to check that the points in this set look like the hexagon centers in here:

with the identity smack dab in the middle.

As you can see from the picture, each hexagon center should have 12 nearest neighbors. But does it work that way for our proposed set Hex\mathrm{Hex}? It will be enough to check that the identity matrix has 12 nearest neighbors in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}, and check that these 12 points are in Hex\mathrm{Hex}. A symmetry argument then shows that each point in Hex\mathrm{Hex} has 12 nearest neighbors arranged in the same pattern, so the points in Hex\mathrm{Hex} are the centers of the hexagons in a hexagonal tiling honeycomb.

It’s easy to measure the distance to the identity matrix if you remember your hyperbolic trig. If we think of 𝔥 2()\mathfrak{h}_2(\mathbb{C}) as Minkowski spacetime by writing a point as

A=(t+z xiy x+iy tz) A = \left( \begin{array}{cc} t + z & x - i y \\ x + i y & t - z \end{array} \right)

then the time coordinate is tr(A)/2\mathrm{tr}(A)/2. The distance in hyperbolic space from the identity to a point AA \in \mathcal{H} is then arccosh(tr(A)/2)\arccosh(\mathrm{tr}(A)/2).

But this is a monotonic function of tr(A)\mathrm{tr}(A). So let’s find the points A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} with the smallest possible trace — not counting the identity itself, which has trace 22. We hope there are 12.

Greg Egan found them:

(1)(2 ζ k ζ¯ k 1 ) \left( \begin{array}{cc} 2 & \zeta^k \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

and

(2)(1 ζ k ζ¯ k 2 ) \left( \begin{array}{cc} 1 & \zeta^k \\ \overline{\zeta}^k & 2 \\ \end{array} \right)

where k=0,1,2,3,4,5k = 0,1,2,3,4,5 and ζ=ω\zeta = -\omega. These matrices are the 6 magenta points and 6 yellow points shown here, while the black point is the identity:

The dark blue points are hexagon vertices, which are not so important right now. By the way, the significance of ζ=ω\zeta = -\omega is that it’s a primitive sixth root of unity; so is ζ¯=ζ 1\overline{\zeta}= \zeta^{-1}. So these give the hexagonal pattern we seek.

These 12 matrices clearly lie in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}), and they have determinant 11 and positive trace so they lie in \mathcal{H}. But the good part is this:

Lemma 1. The 12 matrices in equations (1) and (2) are the matrices in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} that are as close as possible to the identity matrix without being equal to it. In other words, they have the smallest possible trace >2\gt 2 for matrices in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}.

I’ll put the proof of this and all the other lemmas in an appendix.

Now, why do these 12 matrices lie in

Hex={gg *𝔥 2(𝔼)|gGL(2,𝔼)}? \mathrm{Hex} = \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \} ?

Greg found 12 matrices gGL(2,𝔼)g \in \mathrm{GL}(2,\mathbb{E}) that do the job, namely

u k=(1 ζ k 0 1 ) u_k = \left( \begin{array}{cc} 1 & \zeta^k \\ 0 & 1 \\ \end{array} \right)

and

u k *=(1 0 ζ¯ k 1 ) u_k^\ast = \left( \begin{array}{cc} 1 & 0 \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

where k=0,1,2,3,4,5k = 0,1,2,3,4,5.

A concrete construction of all the hexagon centers

Now we can get a concrete way to construct every element of the set

Hex={gg *𝔥 2(𝔼)|gGL(2,𝔼)} \mathrm{Hex} = \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \}

Lemma 2. The group SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) consists of finite products of matrices of the form

u k=(1 ζ k 0 1 )andu k *=(1 0 ζ¯ k 1 ) u_k = \left( \begin{array}{cc} 1 & \zeta^k \\ 0 & 1 \\ \end{array} \right) \qquad \text{and} \qquad u_k^\ast = \left( \begin{array}{cc} 1 & 0 \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

for k=0,1,2,3,4,5k = 0, 1, 2, 3, 4, 5.

Lemma 3. Every element of GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) is an element of SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) multiplied on the right by some power of

h=(ζ 0 0 1 ) h = \left( \begin{array}{cc} \zeta & 0 \\ 0 & 1 \\ \end{array} \right)

Notice that this element hh is in GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) but not in SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}), and it acts on Minkowski space as a 60 degree rotation in the xyx y plane, giving the rotational symmetry in Greg’s image:

While we can implement this 60 degree rotation with an element of SL(2,)\mathrm{SL}(2,\mathbb{C}), namely

±(exp(2πi/12) 0 0 exp(2πi/12) ) \pm \left( \begin{array}{cc} \exp(2 \pi i / 12) & 0 \\ 0 & \exp(-2 \pi i / 12) \\ \end{array} \right)

we cannot do it with any element of SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}). This is why we need to bring GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) into the game.

Lemma 4. The set Hex\mathrm{Hex} equals the set of matrices gg *g g^\ast where gg is a finite product of matrices of the form

u k=(1 ζ k 0 1 )andu k *=(1 0 ζ¯ k 1 )u_k = \left( \begin{array}{cc} 1 & \zeta^k \\ 0 & 1 \\ \end{array} \right) \qquad \text{and} \qquad u_k^\ast = \left( \begin{array}{cc} 1 & 0 \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

for k=0,1,2,3,4,5k = 0, 1, 2, 3, 4, 5.

The theorem: first proof

Now we outline two proofs of the conjecture from last time. Greg did the hard part of the first proof, which uses some computer algebra we will only sketch. But the basic idea is this. We want to show that the points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are precisely the centers of the hexagons in our hexagonal tiling honeycomb. It’s easy to show that all the hexagon centers lie in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}. The hard part is showing the converse. For this we assume we’ve got a point in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} that’s not a hexagon center. We assume it’s as close as possible to the identity matrix. Then, we’ll find another such point that’s even closer — so no such point could exist.

Theorem. The points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are precisely the centers of hexagons in a hexagonal tiling honeycomb, since

Hex=𝔥 2(𝔼) \mathrm{Hex} = \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}

Proof. Part of this theorem is easy. Unfolding the definitions, we need to show

{gg *𝔥 2(𝔼)|gGL(2,𝔼)}={A𝔥 2(𝔼)|det(A)=1,tr(A)>0} \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \} = \{ A \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, \det(A) = 1, \; \mathrm{tr}(A) \gt 0 \}

But the only units in the Eisenstein integers are the 6th roots of unity, so for any gGL(2,𝔼)g \in \mathrm{GL}(2,\mathbb{E}) its determinant is one of those, so det(gg *)=1\det(g g^\ast) = 1 and of course tr(gg *)>0\mathrm{tr}(g g^\ast) \gt 0. This shows the set on the left-hand side is included in the set on the right-hand side.

So, the hard part is to show the reverse inclusion:

{A𝔥 2(𝔼)|det(A)=1,tr(A)>0}{gg *𝔥 2(𝔼)|gGL(2,𝔼)} \{ A \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, \det(A) = 1, \; \mathrm{tr}(A) \gt 0 \} \subseteq \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \}

By Lemma 4, it suffices to assume A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) has det(A)=1\det(A) = 1 and tr(A)>0\mathrm{tr}(A) \gt 0, and prove that AA is of the form gg *g g^\ast, where gg is a finite product of matrices

u k=(1 ζ k 0 1 ),andu k *=(1 0 ζ¯ k 1 ) u_k = \left( \begin{array}{cc} 1 & \zeta^k \\ 0 & 1 \\ \end{array} \right), \qquad \text{and} \qquad u^\ast_k = \left( \begin{array}{cc} 1 & 0 \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

for k=0,1,2,3,4,k = 0, 1, 2, 3, 4, or 55.

For a contradiction, suppose there exists A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) with det(A)=1\det(A) = 1 and tr(A)>0\mathrm{tr}(A) \gt 0 that is not of this form. Since the set of such AA is discrete, we can choose one with the smallest possible trace. We now find one with a smaller trace, namely either u kAu k *u_k A u^\ast_k or u k *Au ku^\ast_k A u_k for some kk.

We can write

A=(a c+dω (cd)dω g ) A = \left( \begin{array}{cc} a & c+d \omega \\ (c-d)-d \omega & g \\ \end{array} \right)

for integers a,c,d,ga,c,d,g obeying the extra conditions:

a+g > 0 agc 2+cdd 2 = 1 \begin{array}{rcl} a+g &\gt&0 \\ a g-c^2+c d-d^2 &=&1 \end{array}

If we act on AA with each of the 12 elements u k,u k *SL(2,𝔼)u_k, u^\ast_k \in \mathrm{SL}(2,\mathbb{E}), the changes in the trace of the original matrix are linear expressions in either a,ca, c and dd or g,cg, c and dd. It will only be impossible to reduce the trace if all 12 expressions are non-negative, for parameters where the determinant is 1.

It is easier to see what’s happening on a plot:

We would need to find values of aa and gg such that the green ellipse (the determinant condition) has a point with integer coordinates inside the smaller of the two hexagons (one of which scales with aa, the other with gg).

The shape of the ellipse and hexagons are such that if the ellipse passed through any one hexagon vertex it would pass through all of them.

We can write the changes in the trace as:

δ k = tr(u kAu k *)tr(A) = g+2ccos(πk3)+d(3sin(πk3)cos(πk3)) Δ k = tr(u k *Au k)tr(A) = a+2ccos(πk3)+d(3sin(πk3)cos(πk3))\begin{array}{rcl} \delta_k & = & \mathrm{tr}(u_k A u^\ast_k) - \mathrm{tr}(A) \\ & = & g + 2 c \cos \left(\frac{\pi k}{3}\right)+d \left(-\sqrt{3} \sin \left(\frac{\pi k}{3}\right)-\cos \left(\frac{\pi k}{3}\right)\right) \\ \Delta_k & = & \mathrm{tr}(u^\ast_k A u_k) - \mathrm{tr}(A) \\ & = & a + 2 c \cos \left(\frac{\pi k}{3}\right)+d \left(-\sqrt{3} \sin \left(\frac{\pi k}{3}\right)-\cos \left(\frac{\pi k}{3}\right)\right) \end{array}

Setting these to zero gives us the sides of the hexagons, while the vertices are found by solving:

δ k=δ k+1=0 \delta_k = \delta_{k+1} = 0

to obtain:

c = 13g(3sin(πk3)cos(πk3)) d = 23g(cos(πk3)cos(13π(k+1)))\begin{array}{rcl} c & = & \frac{1}{3} g \left(\sqrt{3} \sin \left(\frac{\pi k}{3}\right)-\cos \left(\frac{\pi k}{3}\right)\right) \\ d & = & \frac{2}{3} g \left(\cos \left(\frac{\pi k}{3}\right)-\cos \left(\frac{1}{3} \pi (k+1)\right)\right) \end{array}

The result for Δ k\Delta_k is the same, but with gg replaced by aa. Substituting these values into the formula for the determinant of AA and equating that to 1 gives us:

agg 23=1 a g - \frac{g^2}{3} = 1

and for Δ k\Delta_k:

aga 23=1 a g - \frac{a^2}{3} = 1

Without loss of generality we can assume aga \le g, and use the curve defined by the second equation as the boundary for the region in the (a,g)(a,g) parameter space where the determinant ellipse contains points inside the aa-hexagon. The plot below shows that this only happens for a=g=1a=g=1, the identity matrix.

So, unless we are starting with the identity matrix, we can always act with one of the 12 elements u k,u k *SL(2,𝔼)u_k, u^\ast_k \in \mathrm{SL}(2,\mathbb{E}) and get an element with a smaller positive trace.       █

The theorem: second proof

After Greg gave the above proof on Mathstodon, Mist gave a different proof which uses more about Coxeter groups. In this approach, the argument that keeps reducing the trace of a purported counterexample is replaced by the standard fact that every Coxeter group acts transitively on the chambers of its Coxeter complex. I will quote their proof word for word.

Theorem. The points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are precisely the centers of hexagons in a hexagonal tiling honeycomb, since

Hex=𝔥 2(𝔼) \mathrm{Hex} = \mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}

Proof. Let me start with 4\mathbb{R}^4 with elements denoted (t,x,y,z)(t, x, y, z) and equipped with the negative of the Minkowski form, that is, t 2+x 2+y 2+z 2-t^2 + x^2 + y^2 + z^2. Inside here, I choose vectors

e 1 = (0,32,12,0) e 2 = (0,1,0,0) e 3 = (12,12,32,12) e 4 = (0,0,0,1)\begin{array}{ccl} e_1 &=& (0, \frac{\sqrt{3}}{2}, \frac{1}{2}, 0) \\ e_2 &=& (0, -1, 0, 0) \\ e_3 &=& (\frac{1}{2}, \frac{1}{2}, -\frac{\sqrt{3}}{2}, \frac{1}{2}) \\ e_4 &=& (0, 0, 0, -1) \end{array}

By checking inner products, we see that these vectors and the aforementioned bilinear form determine a copy of the canonical representation of the rank-4 Coxeter group with diagram

o--6--o-----o-----o \text{o--6--o-----o-----o}

where the 6 means that the edge is labeled ‘6’. My notation for the canonical representation is consistent with that of the Wikipedia article Coxeter complex.

By the general theory, the canonical representation acts transitively on the set of chambers, where the fundamental chamber is the tetrahedral cone cut out by the hyperplanes (through the origin) which are orthogonal to e 1,e 2,e 3,e 4e_1, e_2, e_3, e_4. Direct computation (e.g. matrix inverse) shows us that the extremal rays of the fundamental chamber are given by the vectors

(3,0,3,0) (4,1,3,0) (1,0,0,0) (1,0,0,1) \begin{array}{c} (3, 0, -\sqrt{3}, 0) \\ (4, 1, -\sqrt{3}, 0) \\ (1, 0, 0, 0) \\ (1, 0, 0, 1) \end{array}

Here I have dropped constant factors, but one can (and perhaps should) normalize to ensure Minkowski norm 1. Now I check explicitly that these vectors match up with Greg’s earlier explicit description of “a portion of the honeycomb”:

  • The vector 16(3,0,3,0)\frac{1}{\sqrt{6}}(3, 0, -\sqrt{3}, 0) is one of the blue vertices of the hexagon centered on the black point.
  • The vector (4,1,3,0)(4, 1, -\sqrt{3}, 0) after normalization becomes the midpoint of the two blue vertices indexed by k=0k=0 and k=1k=1.
  • The vector (1,0,0,0)(1, 0, 0, 0) is the identity matrix, i.e. the black point.
  • The vector (1, 0, 0, 1) is the null vector for one of the two horospheres which contains the hexagon centered on the black point.

According to the Wikipedia article Hexagonal tiling honeycomb, the desired honeycomb is constructed from the aforementioned Coxeter group by applying the Wythoff construction with only the first vertex circled. This confirms that the hexagon centers correspond to the vector (1,0,0,0)(1, 0, 0, 0) and its images under the Coxeter group action.

It remains to show that the hexagon centers coincide with the elements of the Eisenstein lattice with Minkowski norm 1.

To show that the hexagon centers are contained in the Eisenstein lattice, it suffices to show that the Eisenstein lattice is invariant under the Coxeter group action. This follows by checking the action of each of the simple reflections:

vv2v,e ie i. v \mapsto v - 2 \langle v, e_i \rangle e_i.

In more detail, for e 1e_1, the inner product v,e 1\langle v, e_1\rangle is a half-integer if vv is in the Eisenstein lattice, and the claim follows because e 1e_1 is in the Eisenstein lattice. The same statement applies for e 2e_2 and e 4e_4, and ‘half-integer’ can even replaced by ‘integer.’ For e 3e_3, the inner product v,e 3\langle v, e_3\rangle is an integer multiple of (3)/2\sqrt(3)/2, and the claim follows because 3e 3\sqrt{3} e_3 is in the Eisenstein lattice.

To show that the elements of the Eisenstein lattice with Minkowski norm 11 are hexagon centers, it suffices to show this statement within the fundamental chamber. Let n:=(1,0,0,1)n := (1, 0, 0, 1) be the null vector from before. Observe the following: - If vv lies in the forward light cone, i.e. norm(v)>0\mathrm{norm}(v) \gt 0 and t>0t \gt 0, then v,n>0\langle v, n\rangle \gt 0. - If vv lies in the Eisenstein lattice, then v,n\langle v, n\rangle is an integer. - If vv lies in the fundamental chamber and satisfies norm(v)=1\mathrm{norm}(v) = 1, then v,n<2\langle v, n\rangle \lt 2. (Sketch: Restricting to norm(v)=1\mathrm{norm}(v) = 1 gives hyperbolic geometry, and the interior-of-horosphere v,n<2\langle v, n \rangle \lt 2 is convex, so it suffices to check the special cases when vv is a vertex of the fundamental chamber.) These observations imply that, if vv is an element of the Eisenstein lattice with norm(v)=1\mathrm{norm}(v) = 1 lying in the fundamental chamber, then v,n=1\langle v, n \rangle = 1.

Next, write v=(t,x,y,z)v = (t, x, y, z). Sincev,n=1 \langle v, n \rangle = 1 rewrites as tz=1t - z = 1, the relation textrmnorm(v)=1\textrm{norm}(v) = 1 rewrites as x 2+y 2=2zx^2 + y^2 = 2z. This tells us that the elements of the Eisenstein lattice with norm(v)=1\mathrm{norm}(v) = 1 and v,n=1\langle v, n\rangle = 1 are in bijection with the Eisenstein integers via

x+iycorrespondsto((x 2+y 2)/2+1,x,y,(x 2+y 2)/2). x + i y \; corresponds \; to \; ((x^2 + y^2)/2 + 1, x, y, (x^2 + y^2)/2) .

It is easy to check that the simple reflections e 1,e 2,e 3e_1, e_2, e_3 preserve the aforementioned set of vectors vv and this bijection intertwines those simple reflections with the usual reflection symmetries of the Eisenstein integers (viewed as the vertices of the equilateral triangle lattice). Therefore, all of the aforementioned vectors v can be brought to (1,0,0,0)(1, 0, 0, 0) via a Coxeter group action, as desired.       █

What’s next?

A very similar theorem should be true for another regular hyperbolic honeycomb, the square tiling honeycomb:

Here instead of the Eisenstein integers we should use the Gaussian integers, 𝔾\mathbb{G}, consisting of all complex numbers a+bia + b i with a,ba, b \in \mathbb{Z}.

Conjecture. The points in the lattice 𝔥 2(𝔾)\mathfrak{h}_2(\mathbb{G}) that lie on the hyperboloid \mathcal{H} are the centers of squares in a square tiling honeycomb.

I’m also very interested in how these results connect to algebraic geometry! That’s the real theme of this series, and I discussed the connection last time. Briefly, the hexagon centers in the hexagonal tiling honeycomb correspond to principal polarizations of the abelian variety 2/𝔼 2\mathbb{C}^2/\mathbb{E}^2. These are concepts that algebraic geometers know and love. Similarly, if the conjecture above is true, the square centers in the square tiling honeycomb will correspond to principal polarizations of the abelian variety 2/𝔾 2\mathbb{C}^2/\mathbb{G}^2. But I’m especially interested in interpreting the other features of these honeycombs — not just the hexagon and square centers — using ideas from algebraic geometry.

Proofs of lemmas

Lemma 1. The 12 matrices

(2 ζ k ζ¯ k 1 ),(1 ζ k ζ¯ k 2 )k=0,1,2,3,4,5 \left( \begin{array}{cc} 2 & \zeta^k \\ \overline{\zeta}^k & 1 \\ \end{array} \right) , \qquad \left( \begin{array}{cc} 1 & \zeta^k \\ \overline{\zeta}^k & 2 \\ \end{array} \right) \qquad \qquad k = 0,1,2,3,4,5

are the points in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} that as close as possible to the identity matrix without being equal to it. In other words, they have the smallest possible trace >2\gt 2 for matrices in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H}.

Proof. Matrices in 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) \cap \mathcal{H} are of the form

A=(t+z xiy x+iy tz) A = \left( \begin{array}{cc} t + z & x - i y \\ x + i y & t - z \end{array} \right)

where t+z,tzt + z, t - z \in \mathbb{Z}, x+iy𝔼x + i y \in \mathbb{E}, the trace 2t2t is >0\gt 0 and the determinant t 2z 2|x+iy| 2t^2 - z^2 - |x + i y|^2 is 11. Since t+zt + z and tzt - z are integers, tt and zz must be integers divided by 2.

The smallest possible trace for such a matrix is 22, realized only by the identity matrix. We are looking at the second smallest possible trace, which is 33. So, we have t=3/2t = 3/2 and t 2z 2|x+iy| 2=1t^2 - z^2 - |x + i y|^2 = 1, i.e. z 2+|x+iy| 2=5/4z^2 + |x + i y|^2 = 5/4. Since zz is a half-integer and x+iyx + i y is an Eisenstein integer, the only options are to let z=±12z = \pm \frac{1}{2} and let x+iyx + i y be an Eisenstein integer of norm 11, i.e. a sixth root of unity. These give the matrices

(2 ζ k ζ¯ k 1 ),(1 ζ k ζ¯ k 2 ) \left( \begin{array}{cc} 2 & \zeta^k \\ \overline{\zeta}^k & 1 \\ \end{array} \right) , \qquad \left( \begin{array}{cc} 1 & \zeta^k \\ \overline{\zeta}^k & 2 \\ \end{array} \right)

for k=0,1,2,3,4,5k = 0,1,2,3,4,5.       █

Lemma 2. The group SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) consists of finite products of matrices of the form

u k=(1 ζ k 0 1 )andu k *=(1 0 ζ¯ k 1 ) u_k = \left( \begin{array}{cc} 1 & \zeta^k \\ 0 & 1 \\ \end{array} \right) \qquad \text{and} \qquad u^\ast_k = \left( \begin{array}{cc} 1 & 0 \\ \overline{\zeta}^k & 1 \\ \end{array} \right)

for k=0,1,2,3,4,5k = 0, 1, 2, 3, 4, 5.

Proof. In Section 8 of Quadratic integers and Coxeter groups, Johnson and Weiss cite Bianchi to say that SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) is generated by these matrices:

(0 1 1 0 ),(1 0 1 1 ),(1 0 ω 1 ) \left( \begin{array}{cc} 0 & 1 \\ -1 & 0 \\ \end{array} \right), \qquad \left( \begin{array}{cc} 1 & 0 \\ 1 & 1 \\ \end{array} \right) , \qquad \left( \begin{array}{cc} 1 & 0 \\ \omega & 1 \\ \end{array} \right)

The second and third of these matrices equal u 0 *u^\ast_0 and u 2 *u^\ast_2, respectively, so we just need to write the first,

(0 1 1 0 ), \left( \begin{array}{cc} 0 & 1 \\ -1 & 0 \\ \end{array} \right),

as a product of matrices u ku_k and u k *u^\ast_k (or their inverses, which are again matrices of this form). Since

(1 1 0 1 )=(0 1 1 0 )(0 1 1 1 ) \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ \end{array} \right) = \left( \begin{array}{cc} 0 & 1 \\ -1 & 0 \\ \end{array} \right) \left( \begin{array}{cc} 0 & -1 \\ 1 & 1 \\ \end{array} \right)

we have

(0 1 1 0 )=(1 1 0 1 )(0 1 1 1 ) 1 \left( \begin{array}{cc} 0 & 1 \\ -1 & 0 \\ \end{array} \right) = \left( \begin{array}{cc} 1 & 1 \\ 0 & 1 \\ \end{array} \right) \left( \begin{array}{cc} 0 & -1 \\ 1 & 1 \\ \end{array} \right)^{-1}

Since the first matrix in this product is u 0u_0, it suffices to note that

(0 1 1 1 )=(1 1 0 1 )(1 0 1 1 ) \left( \begin{array}{cc} 0 & -1 \\ 1 & 1 \\ \end{array} \right) = \left( \begin{array}{cc} 1 & -1 \\ 0 & 1 \\ \end{array} \right) \left( \begin{array}{cc} 1 & 0 \\ 1 & 1 \\ \end{array} \right) is the product of u 3u_3 and u 0 *u^\ast_0.       █

Lemma 3. Every element of GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) is an element of SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) multiplied on the right by some power of

h=(ζ 0 0 1 ) h = \left( \begin{array}{cc} \zeta & 0 \\ 0 & 1 \\ \end{array} \right)

Proof. The determinant of any gGL(2,𝔼)g \in \mathrm{GL}(2,\mathbb{E}) is some sixth root of unity, and the determinant of hh is a primitive sixth root of unity, so for some power h kh^k we have det(gh k)=1\det(g h^k) = 1 and thus gh kSL(2,𝔼)g h^k\in \mathrm{SL}(2,\mathbb{E}). It follows that gg equals some element of SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}), namely gh kg h^k, multiplied on the left by some power of hh, namely h kh^{-k}.       █

Lemma 4. The set Hex={gg *𝔥 2(𝔼)|gGL(2,𝔼)}\mathrm{Hex} = \{ g g^\ast \in \mathfrak{h}_2(\mathbb{E}) \, \vert \, g \in \mathrm{GL}(2,\mathbb{E}) \} equals the set of matrices gg *g g^\ast where gg is a finite product of the matrices u ku_k and u k *u^\ast_k.

Proof. By Lemma 3, Hex\mathrm{Hex} is exactly the set of matrices (gh k)(gh k) *(g h^k) (g h^k)^\ast where gSL(2,𝔼)g \in \mathrm{SL}(2,\mathbb{E}) and

h=(ζ 0 0 1 ) h = \left( \begin{array}{cc} \zeta & 0 \\ 0 & 1 \\ \end{array} \right)

But the adjoint of hh is its inverse so

(gh k)(gh k) *=gh k(h k) *g=gg * (g h^k) (g h^k)^\ast = g h^k (h^k)^\ast g = g g^\ast

Thus, by Lemma 2, Hex\mathrm{Hex} is exactly the set of matrices gg *g g^\ast where gg is a finite product of the matrices u ku_k and u k *u^\ast_k.       █

May 03, 2024

Matt von HippelPeer Review in Post-scarcity Academia

I posted a link last week to a dialogue written by a former colleague of mine, Sylvain Ribault. Sylvain’s dialogue is a summary of different perspectives on academic publishing. Unlike certain more famous dialogues written by physicists, Sylvain’s account doesn’t have a clear bias: he’s trying to set out the concerns different stakeholders might have and highlight the history of the subject, without endorsing one particular approach as the right one.

The purpose of such a dialogue is to provoke thought, and true to its purpose, the dialogue got me thinking.

Why do peer review? Why do we ask three or so people to read every paper, comment on it, and decide whether it should be published? While one can list many reasons, they seem to fall into two broad groups:

  1. We want to distinguish better science from worse science. We want to reward the better scientists with jobs and grants and tenure. To measure whether scientists are better, we want to see whether they publish more often in the better journals. We then apply those measures on up the chain, funding universities more when they have better scientists, and supporting grant programs that bring about better science.
  2. We want published science to be true. We want to make sure that when a paper is published that the result is actually genuine, free both from deception and from mistakes. We want journalists and the public to know which scientific results are valid, and we want scientists to know what results they can base their own research on.

The first set of goals is a product of scarcity. If we could pay every scientist and fund every scientific project with no cost, we wouldn’t need to worry so much about better and worse science. We’d fund it all and see what happens. The second set of goals is more universal: the whole point of science is to find out the truth, and we want a process that helps to achieve that.

My approach to science is to break problems down. What happens if we had only the second set of concerns, and not the first?

Well, what happens to hobbyists?

I’ve called hobby communities a kind of “post-scarcity academia”. Hobbyists aren’t trying to get jobs doing their hobby or get grants to fund it. They have their day jobs, and research their hobby as a pure passion project. There isn’t much need to rank which hobbyists are “better” than others, but they typically do care about whether what they write is true. So what happens when it’s not?

Sometimes, not much.

My main hobby community was Dungeons and Dragons. In a game with over 50 optional rulebooks covering multiple partially compatible-editions, there were frequent arguments about what the rules actually meant. Some were truly matters of opinion, but some were true misunderstandings, situations where many people thought a rule worked a certain way until they heard the right explanation.

One such rule regarded a certain type of creature called a Warbeast. Warbeasts, like Tolkien’s Oliphaunts, are “upgraded” versions of more normal wild animals, bred and trained for war. There were rules to train a Warbeast, and people interpreted these rules differently: some thought you could find an animal in the wild and train it to become a Warbeast, others thought the rules were for training a creature that was already a Warbeast to fight.

I supported the second interpretation: you can train an existing Warbeast, you can’t train a wild animal to make it into a Warbeast. As such, keep in mind, I’m biased. But every time I explained the reasoning (pointing out that the text was written in the context of an earlier version of the game, and how the numbers in it matched up with that version), people usually agreed with me. And yet, I kept seeing people use the other interpretation. New players would come in asking how to play the game, and get advised to go train wild animals to make them into Warbeasts.

Ok, so suppose the Dungeons and Dragons community had a peer review process. Would that change anything?

Not really! The wrong interpretation was popular. If whoever first proposed it got three random referees, there’s a decent chance none of them would spot the problem. In good science, sometimes the problems with an idea are quite subtle. Referees will spot obvious issues (and not even all of those!), but only the most diligent review (which sometimes happens in mathematics, and pretty much nowhere else) can spot subtle flaws in an argument. For an experiment, you sometimes need more than that: not just a review, but an actual replication.

What would have helped the Dungeons and Dragons community? Not peer review, but citations.

Suppose that, every time someone suggested you could train a wild animal to make it a Warbeast, they had to link to the first post suggesting you could do this. Then I could go to that first post, and try to convince the author that my interpretation was correct. If I succeeded, the author could correct their post, and then every time someone followed one of these citation links it would tell them the claim was wrong.

Academic citations don’t quite work like this. But the idea is out there. People have suggested letting anyone who wants to review a paper, and publishing the reviews next to the piece like comments on a blog post. Sylvain’s dialogue mentions a setup like this, and some of the risks involved.

Still, a setup like that would have gone a long way towards solving the problem for the Dungeons and Dragons community. It has me thinking that something like that is worth exploring.

April 29, 2024

Doug NatelsonMoiré and making superlattices

One of the biggest condensed matter trends in recent years has been the stacking of 2D materials and the development of moiré lattices.  The idea is, take a layer of 2D material and stack it either (1) on itself but with a twist angle, or (2) on another material with a slightly different lattice constant.  Because of interactions between the layers, the electrons in the material have an effective potential energy that has a spatial periodicity associated with the moiré pattern that results.  Twisted stacking hexagonal lattice materials (like graphene or many of the transition metal dichalcogenides) results in a triangular moiré lattice with a moiré lattice constant that depends on twist angle.  Some of the most interesting physics in these systems seems to pop out when the moiré lattice constant is on the order of a few nm to 10 nm or so.  The upside of the moiré approach is that it can produce such an effective lattice over large areas with really good precision and uniformity (provided that the twist angle can really be controlled - see here and here, for example.)  You might imagine using lithography to make designer superlattices, but getting the kind of cleanliness and homogeneity at these very small length scales is very challenging.

It's not surprising, then, that people are interested in somehow applying superlattice potentials to nearby monolayer systems.  Earlier this year, Nature Materials ran three papers published sequentially in one issue on this topic, and this is the accompanying News and Views article.

  • In one approach, a MoSe2/WS2 bilayer is made and the charge in the bilayer is tuned so that the bilayer system is a Mott insulator, with charges localized in exactly the moiré lattice sites.  That results in an electrostatic potential that varies on the moiré lattice scale that can then influence a nearby monolayer, which then shows cool moiré/flat band physics itself.
  • Closely related, investigators used a small-angle twisted bilayer of graphene.  That provides a moiré periodic dielectric environment for a nearby single layer of WSe2.  They can optically excite Rydberg excitons in the WSe2, excitons that are comparatively big and puffy and thus quite sensitive to their dielectric environment.  
  • Similarly, twisted bilayer WS2 can be used to apply a periodic Coulomb potential to a nearby bilayer of graphene, resulting in correlated insulating states in the graphene that otherwise wouldn't be there.

Clearly this is a growth industry.  Clever, creative ways to introduce highly ordered superlattice potentials on very small lengthscales with other symmetries besides triangular lattices would be very interesting.

April 28, 2024

n-Category Café Line Bundles on Complex Tori (Part 3)

You thought this series was dead. But it was only dormant!

In Part 1, I explained how the classification of holomorphic line bundles on a complex torus XX breaks into two parts:

  • the ‘discrete part’: their underlying topological line bundles are classified by elements of a free abelian group called the Néron–Severi group NS(X)\mathrm{NS}(X).

  • the ‘continuous part’: the holomorphic line bundles with a given underlying topological line bundle are classified by elements of a complex torus called the Jacobian Jac(X)\mathrm{Jac}(X).

In Part 2, I explained duality for complex tori, which is a spinoff of duality for complex vector spaces. I used this to give several concrete descriptions of the Néron–Severi group NS(X)NS(X).

But the fun for me lies in the examples. Today let’s actually compute a Néron–Severi group and begin seeing how it leads to this remarkable picture by Roice Nelson:

This is joint work with James Dolan.

The most interesting complex tori are the complex abelian varieties. These are not just complex manifolds: they’re projective varieties, so the ideas of algebraic geometry apply! To be precise, a complex abelian variety is an abelian group object in the category of smooth complex projective varieties.

If you want to learn the general theory, I recommend this:

  • Christina Birkenhake and Herbert Lange, Complex Abelian Varieties, Springer, Berlin, 2013.

It’s given me more pleasure than any book I’ve read for a long time. One reason is that it ties the theory nicely to ideas from physics, like the Heisenberg group and — without coming out and saying so — geometric quantization. Another is that abelian varieties are a charming, safe playground for beginners in algebraic geometry. You can easily compute things, classify things, and so on. It really amounts to linear algebra where all your vector spaces have lattices in them.

But instead of talking about general theorems, I’d like to look at an interesting example.

Every 1-dimensional complex torus can be made into an abelian variety: 1-dimensional abelian varieties are called elliptic curves, and everyone loves them. In higher dimensions the story is completely different: most complex tori can’t be made into abelian varieties! So, a lot of interesting phenomena are first seen in dimension 2. 2-dimensional abelian varieties are called complex abelian surfaces.

Here’s a cheap way to get our hands on an abelian surface: take the product of two elliptic curves. It’s tempting to use one of the two most symmetrical elliptic curves:

  • The Gaussian curve /𝔾\mathbb{C}/\mathbb{G}, where

𝔾={a+bi|a,b} \mathbb{G} = \{ a + b i \; \vert \; a, b \in \mathbb{Z} \}

is called the ring of Gaussian integers because it’s the ring of algebraic integers in the field [i]\mathbb{Q}[i].

  • The Eisenstein curve /𝔼\mathbb{C}/\mathbb{E}, where

𝔼={a+bω|a,b} \mathbb{E} = \{ a + b \omega \; \vert \; a, b \in \mathbb{Z} \}

and ω\omega is the cube root of unity exp(2πi/3)\exp(2 \pi i/ 3). 𝔼\mathbb{E} is called the ring of Eisenstein integers because its the ring of algebraic integers in the field [ω]\mathbb{Q}[\omega].

The Gaussian integers form a square lattice:

while the Eisenstein integers form an equilateral triangular lattice:

There are no other lattices in the plane as symmetrical as these, though there are interesting runners-up coming from algebraic integers in other fields [n]\mathbb{Q}[\sqrt{-n}].

Since the Eisenstein curve has 6-fold symmetry while the Gaussian curve has only 4-fold symmetry, let’s go all out and form an abelian surface by taking a product of two copies of the Eistenstein curve! I’ll call it the Eisenstein surface:

E=/𝔼×/𝔼= 2/𝔼 2 E = \mathbb{C}/\mathbb{E} \times \mathbb{C}/\mathbb{E} = \mathbb{C}^2/\mathbb{E}^2

What are the symmetries of this? Like any complex torus, it acts on itself by translations. These are incredibly important, but they don’t preserve the group structure because they move the origin around. When we talk about morphisms of abelian varieties, we usually mean maps of varieties that also preserve the group structure. So what are the automorphisms of EE as an abelian variety?

Well, actually it’s nice to think about endomorphisms of EE as an abelian variety. Suppose TM 2(𝔼)T \in \mathrm{M}_2(\mathbb{E}) is any 2×22 \times 2 matrix of Eisenstein integers. Then TT acts on 2\mathbb{C}^2 in a linear way, by matrix multiplication. It obviously maps the lattice 𝔼 2 2\mathbb{E}^2 \subset \mathbb{C}^2 to itself. So it defines an endomorphism of 2/𝔼 2\mathbb{C}^2/\mathbb{E}^2. In other words, it gives an endomorphism of the Eisenstein surface as an abelian variety!

It’s not hard to see that these are all we get. So

End(E)=M 2(𝔼) \mathrm{End}(E) = \mathrm{M}_2(\mathbb{E})

Note that these endomorphisms form a ring: not only can you multiply them (i.e. compose them), you can also add them pointwise. Indeed any abelian variety has a ring of endomorphisms for the same reason, and these rings are very important in the overall theory.

Among the endomorphisms are the automorphisms, and I believe the automorphism group of the Eisenstein surface is

Aut(E)=GL(2,𝔼) \mathrm{Aut}(E) = \mathrm{GL}(2,\mathbb{E})

This is an infinite group because it contains ‘shears’ like

(1 1 0 1) \left( \begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix} \right)

Now, what about line bundles on the Eisenstein surface EE? Let’s sketch how to figure out its Néron–Severi group NS(E)NS(E). Remember, this is a coarse classification of holomorphic line bundles where two count as the same if they are topologically isomorphic. Thus we get a discrete classification, not a ‘moduli space’.

I described the Néron–Severi group in a bunch of ways in Part 2. Here’s the one we want now. If XX is a complex torus we can write

X=V/L X = V/L

where VV is a finite-dimensional complex vector space and LL is a lattice in VV. The vector space VV has a dual V *V^\ast, defined in the usual way, and the lattice LVL \subset V also has a dual L *V *L^\ast \subset V^\ast, defined in a different way:

L *={f:V:fisreallinearandf(v)forallvL} L^\ast = \{ f \colon V \to \mathbb{R} : f \; is\; real&#8211;linear \; and\; f(v) \in \mathbb{Z} \; for \; all \; v \in L \}

Then we saw something I called Theorem 22':

Theorem 22'. The Néron–Severi group NS(X)NS(X) consists of linear maps h:VV *h \colon V \to V^\ast that map LL into L *L^\ast and have h *=hh^\ast = h.

The point here is that any linear map f:VWf \colon V \to W has an adjoint f *:W *V *f^\ast \colon W^\ast \to V^\ast, so the map hh has an adjoint h *:V **V *h^\ast \colon V^{\ast\ast} \to V^\ast, but the double dual of VV is canonically isomorphic to VV itself, so with a nod and a wink we can write h *:VV *h^\ast \colon V \to V^\ast, so it makes sense to say h *=hh^\ast = h.

You may be slightly dazed now — are you seeing stars? Luckily, all of this becomes less confusing in our actual example where V= 2V = \mathbb{C}^2 and L=𝔼 2L = \mathbb{E}^2, since the standard inner product on 2\mathbb{C}^2 lets us identify this vector space with its dual, and — check this out, this part is not quite trivial — that lets us identify the lattice 𝔼\mathbb{E} with its dual!

So, the Neron–Severi group NS(E)NS(E) of the Eisenstein surface E= 2/𝔼 2E = \mathbb{C}^2/\mathbb{E}^2 consists of 2×22 \times 2 complex matrices that map 𝔼 2\mathbb{E}^2 to itself and are self-adjoint!

But it’s even simpler than that, since 2×22 \times 2 complex matrices that map 𝔼 2\mathbb{E}^2 to itself are just 2×22 \times 2 matrices of Eisenstein integers. The set of these is our friend M 2(𝔼)\mathrm{M}_2(\mathbb{E}). But now we want the self-adjoint ones. I’ll denote the set of these by 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}). Here the gothic 𝔥\mathfrak{h} stands for ‘hermitian’.

So we’ve figured out the Neron–Severi group of the Eisenstein surface. It consists of 2×22 \times 2 hermitian matrices of Eisenstein integers:

NS(E)=𝔥 2(𝔼)! NS(E) = \mathfrak{h}_2(\mathbb{E}) \; !

Now let’s try to visualize it.

The fun part

I’ll dig into this more next time, but let me state the marvelous facts now, just to whet your appetite. The space of all complex 2×22 \times 2 self-adjoint matrices, called 𝔥 2()\mathfrak{h}_2(\mathbb{C}), is famous in physics. It’s 4-dimensional — and it’s a nice way of thinking about Minkowski spacetime, our model of spacetime in special relativity.

Sitting inside Minkowski spacetime, we now see the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) of 2×22 \times 2 self-adjoint matrices with Eisenstein number entries. It’s a very nice discretization of spacetime.

It’s a bit hard to visualize 4-dimensional things. So let’s look at 2×22 \times 2 self-adjoint matrices whose determinant is 11 and whose trace is positive. These form a 3-dimensional hyperboloid in Minkowski spacetime, called hyperbolic space. And it’s no hyperbole to say that this a staggeringly beautiful alternative to 3-dimensional Euclidean space. It’s negatively curved, so lines that start out parallel get further and further apart in an exponential way as they march along. There’s a lot more room in hyperbolic space — a lot of room for fun.

What happens if we look at points in our lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that happen to lie in hyperbolic space? I believe we get the centers of the hexagons in this picture:

And I believe the other features of this picture arise from other relationships between 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) and hyperbolic space. There’s a lot to check here. Greg Egan has made a lot of progress, but I’ll talk about that next time.

One last thing. I showed you that elements of 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) correspond to topological isomorphism classes of holomorphic line bundles on the Eisenstein surface. Then I showed you a cool picture of a subset of 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}), namely the elements with determinant 11 and trace >0&gt; 0. But what’s the importance of these? Am I focusing on them merely to get a charismatic picture in hyperbolic space?

No: it turns out that these elements correspond to something really nice: principal polarizations of the Eisenstein surface! These come from very best line bundles, in a certain precise sense.

April 27, 2024

Clifford JohnsonLiving in the Matrix – Recent Advances in Understanding Quantum Spacetime

It has been extremely busy in the ten months or so since I last wrote something here. It’s perhaps the longest break I’ve taken from blogging for 20 years (gosh!) but I think it was a healthy thing to do. Many readers have been following some of my ocassional scribblings … Click to continue reading this post

The post Living in the Matrix – Recent Advances in Understanding Quantum Spacetime appeared first on Asymptotia.

April 26, 2024

Tommaso DorigoShaping The Future Of AI For Fundamental Physics

From April 30 to May 3 more than 300 researchers in fundamental physics will gather in Amsterdam for the first edition of the EUCAIF conference, an initiative supported by the APPEC, NuPecc and ECFA consortia, which is meant to structure future European research activities in fundamental physics with Artificial Intelligence technologies.


read more

Matt von HippelThe Quantum Paths Not Traveled

Before this week’s post: a former colleague of mine from CEA Paris-Saclay, Sylvain Ribault, posted a dialogue last week presenting different perspectives on academic publishing. One of the highlights of my brief time at the CEA were the times I got to chat with Sylvain and others about the future forms academia might take. He showed me a draft of his dialogue a while ago, designed as a way to introduce newcomers to the debate about how, and whether, academics should do peer review. I’ve got a different topic this week so I won’t say much more about it, but I encourage you to take a look!


Matt Strassler has a nice post up about waves and particles. He’s writing to address a common confusion, between two concepts that sound very similar. On the other hand, there are the waves of quantum field theory, ripples in fundamental fields the smallest versions of which correspond to particles. (Strassler likes to call them “wavicles”, to emphasize their wavy role.) On the other hand, there are the wavefunctions of quantum mechanics, descriptions of the behavior of one or more interacting particles over time. To distinguish, he points out that wavicles can hurt you, while wavefunctions cannot. Wavicles are the things that collide and light up detectors, one by one, wavefunctions are the math that describes when and how that happens. Many types of wavicles can run into each other one by one, but their interactions can all be described together by a single wavefunction. It’s an important point, well stated.

(I do think he goes a bit too far in saying that the wavefunction is not “an object”, though. That smacks of metaphysics, and I think that’s not worth dabbling in for physicists.)

After reading his post, there’s something that might still confuse you. You’ve probably heard that in quantum mechanics, an electron is both a wave and a particle. Does the “wave” in that saying mean “wavicle”, or “wavefunction”?

A “wave” built out of particles

The gif above shows data from a double-slit experiment, an important type of experiment from the early days of quantum mechanics. These experiments were first conducted before quantum field theory (and thus, before the ideas that Strassler summarizes with “wavicles”). In a double-slit experiment, particles are shot at a screen through two slits. The particles that hit the screen can travel through one slit or the other.

A double-slit experiment, in diagram form

Classically, you would expect particles shot randomly at the screen to form two piles on the other side, one in front of each slit. Instead, they bunch up into a rippling pattern, the same sort of pattern that was used a century earlier to argue that light was a wave. The peaks and troughs of the wave pass through both slits, and either line up or cancel out, leaving the distinctive pattern.

When it was discovered that electrons do this too, it led to the idea that electrons must be waves as well, despite also being particles. That insight led to the concept of the wavefunction. So the “wave” in the saying refers to wavefunctions.

But electrons can hurt you, and as Strassler points out, wavefunctions cannot. So how can the electron be a wavefunction?

To risk a bit of metaphysics myself, I’ll just say: it can’t. An electron can’t “be” a wavefunction.

The saying, that electrons are both particles and waves, is from the early days of quantum mechanics, when people were confused about what it all meant. We’re still confused, but we have some better ways to talk about it.

As a start, it’s worth noticing that, whenever you measure an electron, it’s a particle. Each electron that goes through the slits hits your screen as a particle, a single dot. If you see many electrons at once, you may get the feeling that they look like waves. But every actual electron you measure, every time you’re precise enough to notice, looks like a particle. And for each individual electron, you can extrapolate back the path it took, exactly as if it traveled like a particle the whole way through.

The same is true, though, of light! When you see light, photons enter your eyes, and each one that you see triggers a chemical change in a molecule called a photopigment. The same sort of thing happens for photographs, while an electrical signal gets triggered instead in a digital camera. Light may behave like a wave in some sense, but every time you actually observe it it looks like a particle.

But while you can model each individual electron, or photon, as a classical particle, you can’t model the distribution of multiple electrons that way.

That’s because in quantum mechanics, the “paths not taken” matter. A single electron will only go through one slit in the double-slit experiment. But the fact that it could have gone through both slits matters, and changes the chance that it goes through each particular path. The possible paths in the wavefunction interfere with each other, the same way different parts of classical waves do.

That role of the paths not taken, of the “what if”, is the heart and soul of quantum mechanics. No matter how you interpret its mysteries, “what if” matters. If you believe in a quantum multiverse, you think every “what if” happens somewhere in that infinity of worlds. If you think all that matters is observations, then “what if” shows the folly of modeling the world as anything else. If you are tempted to try to mend quantum mechanics with faster-than-light signals, then you have to declare one “what if” the true one. And if you want to double-down on determinism and replace quantum mechanics, you need to declare that certain “what if” questions are off-limits.

“What if matters” isn’t the same as a particle traveling every path at once, it’s its own weird thing with its own specific weird consequences. It’s a metaphor, because everything written in words is a metaphor. But it’s a better metaphor than thinking an electron is both a particle and a wave.

April 25, 2024

Terence TaoNotes on the B+B+t theorem

A recent paper of Kra, Moreira, Richter, and Robertson established the following theorem, resolving a question of Erdös. Given a discrete amenable group {G = (G,+)}, and a subset {A} of {G}, we define the Banach density of {A} to be the quantity

\displaystyle  \sup_\Phi \limsup_{N \rightarrow \infty} |A \cap \Phi_N|/|\Phi_N|,

where the supremum is over all Følner sequences {\Phi = (\Phi_N)_{N=1}^\infty} of {G}. Given a set {B} in {G}, we define the restricted sumset {B \oplus B} to be the set of all pairs {b_1+b_2} where {b_1, b_2} are distinct elements of {B}.

Theorem 1 Let {G} be a countably infinite abelian group with the index {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists an infinite set {B \subset A} and {t \in G} such that {B \oplus B + t \subset A}.

Strictly speaking, the main result of Kra et al. only claims this theorem for the case of the integers {G={\bf Z}}, but as noted in the recent preprint of Charamaras and Mountakis, the argument in fact applies for all countable abelian {G} in which the subgroup {2G := \{ 2x: x \in G \}} has finite index. This condition is in fact necessary (as observed by forthcoming work of Ethan Acklesberg): if {2G} has infinite index, then one can find a subgroup {H_j} of {G} of index {2^j} for any {j \geq 1} that contains {2G} (or equivalently, {G/H_j} is {2}-torsion). If one lets {y_1,y_2,\dots} be an enumeration of {G}, and one can then check that the set

\displaystyle  A := G \backslash \bigcup_{j=1}^\infty (H_{j+1} + y_j) \backslash \{y_1,\dots,y_j\}

has positive Banach density, but does not contain any set of the form {B \oplus B + t} for any {t} (indeed, from the pigeonhole principle and the {2}-torsion nature of {G/H_{j+1}} one can show that {B \oplus B + y_j} must intersect {H_{j+1} + y_j \backslash \{y_1,\dots,y_j\}} whenever {B} has cardinality larger than {j 2^{j+1}}). It is also necessary to work with restricted sums {B \oplus B} rather than full sums {B+B}: a counterexample to the latter is provided for instance by the example with {G = {\bf Z}} and {A := \bigcup_{j=1}^\infty [10^j, 1.1 \times 10^j]}. Finally, the presence of the shift {t} is also necessary, as can be seen by considering the example of {A} being the odd numbers in {G ={\bf Z}}, though in the case {G=2G} one can of course delete the shift {t} at the cost of giving up the containment {B \subset A}.

Theorem 1 resembles other theorems in density Ramsey theory, such as Szemerédi’s theorem, but with the notable difference that the pattern located in the dense set {A} is infinite rather than merely arbitrarily large but finite. As such, it does not seem that this theorem can be proven by purely finitary means. However, one can view this result as the conjunction of an infinite number of statements, each of which is a finitary density Ramsey theory statement. To see this, we need some more notation. Observe from Tychonoff’s theorem that the collection {2^G := \{ B: B \subset G \}} is a compact topological space (with the topology of pointwise convergence) (it is also metrizable since {G} is countable). Subsets {{\mathcal F}} of {2^G} can be thought of as properties of subsets of {G}; for instance, the property a subset {B} of {G} of being finite is of this form, as is the complementary property of being infinite. A property of subsets of {G} can then be said to be closed or open if it corresponds to a closed or open subset of {2^G}. Thus, a property is closed and only if if it is closed under pointwise limits, and a property is open if, whenever a set {B} has this property, then any other set {B'} that shares a sufficiently large (but finite) initial segment with {B} will also have this property. Since {2^G} is compact and Hausdorff, a property is closed if and only if it is compact.

The properties of being finite or infinite are neither closed nor open. Define a smallness property to be a closed (or compact) property of subsets of {G} that is only satisfied by finite sets; the complement to this is a largeness property, which is an open property of subsets of {G} that is satisfied by all infinite sets. (One could also choose to impose other axioms on these properties, for instance requiring a largeness property to be an upper set, but we will not do so here.) Examples of largeness properties for a subset {B} of {G} include:

  • {B} has at least {10} elements.
  • {B} is non-empty and has at least {b_1} elements, where {b_1} is the smallest element of {B}.
  • {B} is non-empty and has at least {b_{b_1}} elements, where {b_n} is the {n^{\mathrm{th}}} element of {B}.
  • {T} halts when given {B} as input, where {T} is a given Turing machine that halts whenever given an infinite set as input. (Note that this encompasses the preceding three examples as special cases, by selecting {T} appropriately.)
We will call a set obeying a largeness property {{\mathcal P}} an {{\mathcal P}}-large set.

Theorem 1 is then equivalent to the following “almost finitary” version (cf. this previous discussion of almost finitary versions of the infinite pigeonhole principle):

Theorem 2 (Almost finitary form of main theorem) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {\Phi_n} be a Følner sequence in {G}, let {\delta>0}, and let {{\mathcal P}_t} be a largeness property for each {t \in G}. Then there exists {N} such that if {A \subset G} is such that {|A \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, then there exists a shift {t \in G} and {A} contains a {{\mathcal P}_t}-large set {B} such that {B \oplus B + t \subset A}.

Proof of Theorem 2 assuming Theorem 1. Let {G, \Phi_n}, {\delta}, {{\mathcal P}_t} be as in Theorem 2. Suppose for contradiction that Theorem 2 failed, then for each {N} we can find {A_N} with {|A_N \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, such that there is no {t} and {{\mathcal P}_t}-large {B} such that {B, B \oplus B + t \subset A_N}. By compactness, a subsequence of the {A_N} converges pointwise to a set {A}, which then has Banach density at least {\delta}. By Theorem 1, there is an infinite set {B} and a {t} such that {B, B \oplus B + t \subset A}. By openness, we conclude that there exists a finite {{\mathcal P}_t}-large set {B'} contained in {B}, thus {B', B' \oplus B' + t \subset A}. This implies that {B', B' \oplus B' + t \subset A_N} for infinitely many {N}, a contradiction.

Proof of Theorem 1 assuming Theorem 2. Let {G, A} be as in Theorem 1. If the claim failed, then for each {t}, the property {{\mathcal P}_t} of being a set {B} for which {B, B \oplus B + t \subset A} would be a smallness property. By Theorem 2, we see that there is a {t} and a {B} obeying the complement of this property such that {B, B \oplus B + t \subset A}, a contradiction.

Remark 3 Define a relation {R} between {2^G} and {2^G \times G} by declaring {A\ R\ (B,t)} if {B \subset A} and {B \oplus B + t \subset A}. The key observation that makes the above equivalences work is that this relation is continuous in the sense that if {U} is an open subset of {2^G \times G}, then the inverse image

\displaystyle R^{-1} U := \{ A \in 2^G: A\ R\ (B,t) \hbox{ for some } (B,t) \in U \}

is also open. Indeed, if {A\ R\ (B,t)} for some {(B,t) \in U}, then {B} contains a finite set {B'} such that {(B',t) \in U}, and then any {A'} that contains both {B'} and {B' \oplus B' + t} lies in {R^{-1} U}.

For each specific largeness property, such as the examples listed previously, Theorem 2 can be viewed as a finitary assertion (at least if the property is “computable” in some sense), but if one quantifies over all largeness properties, then the theorem becomes infinitary. In the spirit of the Paris-Harrington theorem, I would in fact expect some cases of Theorem 2 to undecidable statements of Peano arithmetic, although I do not have a rigorous proof of this assertion.

Despite the complicated finitary interpretation of this theorem, I was still interested in trying to write the proof of Theorem 1 in some sort of “pseudo-finitary” manner, in which one can see analogies with finitary arguments in additive combinatorics. The proof of Theorem 1 that I give below the fold is my attempt to achieve this, although to avoid a complete explosion of “epsilon management” I will still use at one juncture an ergodic theory reduction from the original paper of Kra et al. that relies on such infinitary tools as the ergodic decomposition, the ergodic theory, and the spectral theorem. Also some of the steps will be a little sketchy, and assume some familiarity with additive combinatorics tools (such as the arithmetic regularity lemma).

— 1. Proof of theorem —

The proof of Kra et al. proceeds by establishing the following related statement. Define a (length three) combinatorial Erdös progression to be a triple {(A,X_1,X_2)} of subsets of {G} such that there exists a sequence {n_j \rightarrow \infty} in {G} such that {A - n_j} converges pointwise to {X_1} and {X_1-n_j} converges pointwise to {X_2}. (By {n_j \rightarrow \infty}, we mean with respect to the cocompact filter; that is, that for any finite (or, equivalently, compact) subset {K} of {G}, {n_j \not \in K} for all sufficiently large {j}.)

Theorem 4 (Combinatorial Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists a combinatorial Erdös progression {(A,X_1,X_2)} with {0 \in X_1} and {X_2} non-empty.

Let us see how Theorem 4 implies Theorem 1. Let {G, A, X_1, X_2, n_j} be as in Theorem 4. By hypothesis, {X_2} contains an element {t} of {G}, thus {0 \in X_1} and {t \in X_2}. Setting {b_1} to be a sufficiently large element of the sequence {n_1, n_2, \dots}, we conclude that {b_1 \in A} and {b_1 + t \in X_1}. Setting {b_2} to be an even larger element of this sequence, we then have {b_2, b_2+b_1+t \in A} and {b_2 +t \in X_1}. Setting {b_3} to be an even larger element, we have {b_3, b_3+b_1+t, b_3+b_2+t \in A} and {b_3 + t \in X_1}. Continuing in this fashion we obtain the desired infinite set {B}.

It remains to establish Theorem 4. The proof of Kra et al. converts this to a topological dynamics/ergodic theory problem. Define a topological measure-preserving {G}-system {(X,T,\mu)} to be a compact space {X} equipped with a Borel probability measure {\mu} as well as a measure-preserving homeomorphism {T: X \rightarrow X}. A point {a} in {X} is said to be generic for {\mu} with respect to a Følner sequence {\Phi} if one has

\displaystyle  \int_X f\ d\mu = \lim_{N \rightarrow \infty} {\bf E}_{n \in \Phi_N} f(T^n a)

for all continuous {f: X \rightarrow {\bf C}}. Define an (length three) dynamical Erdös progression to be a tuple {(a,x_1,x_2)} in {X} with the property that there exists a sequence {n_j \rightarrow \infty} such that {T^{n_j} a \rightarrow x_1} and {T^{n_j} x_1 \rightarrow x_2}.

Theorem 4 then follows from

Theorem 5 (Dynamical Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {(X,T,\mu)} be a topological measure-preserving {G}-system, let {a} be a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi}, and let {E} be a positive measure open subset of {X}. Then there exists a dynamical Erdös progression {(a,x_1,x_2)} with {x_1 \in E} and {x_2 \in \bigcup_{t \in G} T^t E}.

Indeed, we can take {X} to be {2^G}, {a} to be {A}, {T} to be the shift {T^n B := B-n}, {E := \{ B \in 2^G: 0 \in B \}}, and {\mu} to be a weak limit of the {\mathop{\bf E}_{n \in \Phi_N} \delta_{A-n}} for a Følner sequence {\Phi_N} with {\lim_{N \rightarrow \infty} |A \cap \Phi_N| / |\Phi_N| > 0}, at which point Theorem 4 follows from Theorem 5 after chasing definitions. (It is also possible to establish the reverse implication, but we will not need to do so here.)

A remarkable fact about this theorem is that the point {a} need not be in the support of {\mu}! (In a related vein, the elements {\Phi_j} of the Følner sequence are not required to contain the origin.)

Using a certain amount of ergodic theory and spectral theory, Kra et al. were able to reduce this theorem to a special case:

Theorem 6 (Reduction) To prove Theorem 5, it suffices to do so under the additional hypotheses that {X} is ergodic, and there is a continuous factor map to the Kronecker factor. (In particular, the eigenfunctions of {X} can be taken to be continuous.)

We refer the reader to the paper of Kra et al. for the details of this reduction. Now we specialize for simplicity to the case where {G = {\bf F}_p^\omega = \bigcup_N {\bf F}_p^N} is a countable vector space over a finite field of size equal to an odd prime {p}, so in particular {2G=G}; we also specialize to Følner sequences of the form {\Phi_j = x_j + {\bf F}_p^{N_j}} for some {x_j \in G} and {N_j \geq 1}. In this case we can prove a stronger statement:

Theorem 7 (Odd characteristic case) Let {G = {\bf F}_p^\omega} for an odd prime {p}. Let {(X,T,\mu)} be a topological measure-preserving {G}-system with a continuous factor map to the Kronecker factor, and let {E_1, E_2} be open subsets of {X} with {\mu(E_1) + \mu(E_2) > 1}. Then if {a} is a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi_j = y_j + {\bf F}_p^{n_j}}, there exists an Erdös progression {(a,x_1,x_2)} with {x_1 \in E_1} and {x_2 \in E_2}.

Indeed, in the setting of Theorem 5 with the ergodicity hypothesis, the set {\bigcup_{t \in G} T^t E} has full measure, so the hypothesis {\mu(E_1)+\mu(E_2) > 1} of Theorem 7 will be verified in this case. (In the case of more general {G}, this hypothesis ends up being replaced with {\mu(E_1)/[G:2G] + \mu(E_2) > 1}; see Theorem 2.1 of this recent preprint of Kousek and Radic for a treatment of the case {G={\bf Z}} (but the proof extends without much difficulty to the general case).)

As with Theorem 1, Theorem 7 is still an infinitary statement and does not have a direct finitary analogue (though it can likely be expressed as the conjunction of infinitely many such finitary statements, as we did with Theorem 1). Nevertheless we can formulate the following finitary statement which can be viewed as a “baby” version of the above theorem:

Theorem 8 (Finitary model problem) Let {X = (X,d)} be a compact metric space, let {G = {\bf F}_p^N} be a finite vector space over a field of odd prime order. Let {T} be an action of {G} on {X} by homeomorphisms, let {a \in X}, and let {\mu} be the associated {G}-invariant measure {\mu = {\bf E}_{x \in G} \delta_{T^x a}}. Let {E_1, E_2} be subsets of {X} with {\mu(E_1) + \mu(E_2) > 1 + \delta} for some {\delta>0}. Then for any {\varepsilon>0}, there exist {x_1 \in E_1, x_2 \in E_2} such that

\displaystyle  |\{ h \in G: d(T^h a,x_1) \leq \varepsilon, d(T^h x_1,x_2) \leq \varepsilon \}| \gg_{p,\delta,\varepsilon,X} |G|.

The important thing here is that the bounds are uniform in the dimension {N} (as well as the initial point {a} and the action {T}).

Let us now give a finitary proof of Theorem 8. We can cover the compact metric space {X} by a finite collection {B_1,\dots,B_M} of open balls of radius {\varepsilon/2}. This induces a coloring function {\tilde c: X \rightarrow \{1,\dots,M\}} that assigns to each point in {X} the index {m} of the first ball {B_m} that covers that point. This then induces a coloring {c: G \rightarrow \{1,\dots,M\}} of {G} by the formula {c(h) := \tilde c(T^h a)}. We also define the pullbacks {A_i := \{ h \in G: T^h a \in E_i \}} for {i=1,2}. By hypothesis, we have {|A_1| + |A_2| > (1+\delta)|G|}, and it will now suffice by the triangle inequality to show that

\displaystyle  |\{ h \in G: c(h) = c(x_1); c(h+x_1)=c(x_2) \}| \gg_{p,\delta,M} |G|.

Now we apply the arithmetic lemma of Green with some regularity parameter {\kappa>0} to be chosen later. This allows us to partition {G} into cosets of a subgroup {H} of index {O_{p,\kappa}(1)}, such that on all but {\kappa [G:H]} of these cosets {y+H}, all the color classes {\{x \in y+H: c(x) = c_0\}} are {\kappa^{100}}-regular in the Fourier ({U^2}) sense. Now we sample {x_1} uniformly from {G}, and set {x_2 := 2x_1}; as {p} is odd, {x_2} is also uniform in {G}. If {x_1} lies in a coset {y+H}, then {x_2} will lie in {2y+H}. By removing an exceptional event of probability {O(\kappa)}, we may assume that neither of these cosetgs {y+H}, {2y+H} is a bad coset. By removing a further exceptional event of probability {O_M(\kappa)}, we may also assume that {x_1} is in a popular color class of {y+H} in the sense that

\displaystyle  |\{ x \in y+H: c(x) = c(x_1) \}| \geq \kappa |H| \ \ \ \ \ (1)

since the set of exceptional {x_1} that fail to achieve this only are hit with probability {O(M\kappa)}. Similarly we may assume that

\displaystyle  |\{ x \in 2y+H: c(x) = c(x_2) \}| \geq \kappa |H|. \ \ \ \ \ (2)

Now we consider the quantity

\displaystyle  |\{ h \in y+H: c(h) = c(x_1); c(h+x_1)=c(x_2) \}|

which we can write as

\displaystyle  |H| {\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h) 1_{c^{-1}(c(x_2))}(h+x_1).

Both factors here are {O(\kappa^{100})}-uniform in their respective cosets. Thus by standard Fourier calculations, we see that after excluding another exceptional event of probabitiy {O(\kappa)}, this quantity is equal to

\displaystyle  |H| (({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h)) ({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_2))}(h+x_1)) + O(\kappa^{10})).

By (1), (2), this expression is {\gg \kappa^2 |H| \gg_{p,\kappa} |G|}. By choosing {\kappa} small enough depending on {M,\delta}, we can ensure that {x_1 \in E_1} and {x_2 \in E_2}, and the claim follows.

Now we can prove the infinitary result in Theorem 7. Let us place a metric {d} on {X}. By sparsifying the Følner sequence {\Phi_j = y_j + {\bf F}_p^{N_j}}, we may assume that the {n_j} grow as fast as we wish. Once we do so, we claim that for each {J}, we can find {x_{1,J}, x_{2,J} \in X} such that for each {1 \leq j \leq J}, there exists {n_j \in \Phi_j} that lies outside of {{\bf F}_p^j} such that

\displaystyle  d(T^{n_j} a, x_{1,J}) \leq 1/j, \quad d(T^{n_j} x_{1,J}, x_{2,J}) \leq 1/j.

Passing to a subsequence to make {x_{1,J}, x_{2,J}} converge to {x_1, x_2} respectively, we obtain the desired Erdös progression.

Fix {J}, and let {M} be a large parameter (much larger than {J}) to be chosen later. By genericity, we know that the discrete measures {{\bf E}_{h \in \Phi_M} \delta_{T^h a}} converge vaguely to {\mu}, so any point in the support in {\mu} can be approximated by some point {T^h a} with {h \in \Phi_M}. Unfortunately, {a} does not necessarily lie in this support! (Note that {\Phi_M} need not contain the origin.) However, we are assuming a continuous factor map {\pi:X \rightarrow Z} to the Kronecker factor {Z}, which is a compact abelian group, and {\mu} pushes down to the Haar measure of {Z}, which has full support. In particular, thus pushforward contains {\pi(a)}. As a consequence, we can find {h_M \in \Phi_M} such that {\pi(T^{h_M} a)} converges to {\pi(a)}, even if we cannot ensure that {T^{h_M} a} converges to {a}. We are assuming that {\Phi_M} is a coset of {{\bf F}_p^{n_M}}, so now {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}} converges vaguely to {\mu}.

We make the random choice {x_{1,J} := T^{h_*+h_M} a}, {x_{2,J} := T^{2h_*+h_M} a}, where {h_*} is drawn uniformly at random from {{\bf F}_p^{n_M}}. This is not the only possible choice that can be made here, and is in fact not optimal in certain respects (in particular, it creates a fair bit of coupling between {x_{1,J}}, {x_{2,J}}), but is easy to describe and will suffice for our argument. (A more appropriate choice, closer to the arguments of Kra et al., would be to {x_{2,J}} in the above construction by {T^{2h_*+k_*+h_M} a}, where the additional shift {k_*} is a random variable in {{\bf F}_p^{n_M}} independent of {h_*} that is uniformly drawn from all shifts annihilated by the first {M} characters associated to some enumeration of the (necessarily countable) point spectrum of {T}, but this is harder to describe.)

Since we are in odd characteristic, the map {h \mapsto 2h} is a permutation on {h \in {\bf F}_p^{n_M}}, and so {x_{1,J}}, {x_{2,J}} are both distributed according to the law {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}}, though they are coupled to each other. In particular, by vague convergence (and inner regularity) we have

\displaystyle  {\bf P}( x_{1,J} \in E_1 ) \geq \mu(E_1) - o(1)

and

\displaystyle  {\bf P}( x_{2,J} \in E_2 ) \geq \mu(E_2) - o(1)

where {o(1)} denotes a quantity that goes to zero as {M \rightarrow \infty} (holding all other parameters fixed). By the hypothesis {\mu(E_1)+\mu(E_2) > 1}, we thus have

\displaystyle  {\bf P}( x_{1,J} \in E_1, x_{2,J} \in E_2 ) \geq \kappa - o(1) \ \ \ \ \ (3)

for some {\kappa>0} independent of {M}.

We will show that for each {1 \leq j \leq J}, one has

\displaystyle  |\{ h \in \Phi_j: d(T^{h} a,x_{1,J}) \leq 1/j, d(T^h x_{1,J},x_{2,J}) \leq 1/j \}| \ \ \ \ \ (4)

\displaystyle  \gg_{p,\kappa,j,X} (1-o(1)) |\Phi_j|

outside of an event of probability at most {\kappa/2^{j+1}+o(1)} (compare with Theorem 8). If this is the case, then by the union bound we can find (for {M} large enough) a choice of {x_{1,J}}, {x_{2,J}} obeying (3) as well as (4) for all {1 \leq j \leq J}. If the {N_j} grow fast enough, we can then ensure that for each {1 \leq j \leq J} one can find (again for {M} large enough) {n_j} in the set in (4) that avoids {{\bf F}_p^j}, and the claim follows.

It remains to show (4) outside of an exceptional event of acceptable probability. Let {\tilde c: X \rightarrow \{1,\dots,M_j\}} be the coloring function from the proof of Theorem 8 (with {\varepsilon := 1/j}). Then it suffices to show that

\displaystyle  |\{ h \in \Phi_j: c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \gg_{p,\kappa,M_j} (1-o(1)) |\Phi_j|

where {c_0(h) := \tilde c(T^h a)} and {c(h) := \tilde c(T^{h+h_M} a)}. This is a counting problem associated to the patterm {(h_*, h, h+h_*, 2h_*)}; if we concatenate the {h_*} and {2h_*} components of the pattern, this is a classic “complexity one” pattern, of the type that would be expected to be amenable to Fourier analysis (especially if one applies Cauchy-Schwarz to eliminate the {h_*} averaging and absolute value, at which point one is left with the {U^2} pattern {(h, h+h_*, h', h'+h_*)}).

In the finitary setting, we used the arithmetic regularity lemma. Here, we will need to use the Kronecker factor instead. The indicator function {1_{\tilde c^{-1}(i)}} of a level set of the coloring function {\tilde c} is a bounded measurable function of {X}, and can thus be decomposed into a function {f_i} that is measurable on the Kronecker factor, plus an error term {g_i} that is orthogonal to that factor and thus is weakly mixing in the sense that {|\langle T^h g_i, g_i \rangle|} tends to zero on average (or equivalently, that the Host-Kra seminorm {\|g_i\|_{U^2}} vanishes). Meanwhile, for any {\varepsilon > 0}, the Kronecker-measurable function {f_i} can be decomposed further as {P_{i,\varepsilon} + k_{i,\varepsilon}}, where {P_{i,\varepsilon}} is a bounded “trigonometric polynomial” (a finite sum of eigenfunctions) and {\|k_{i,\varepsilon}\|_{L^2} < \varepsilon}. The polynomial {P_{i,\varepsilon}} is continuous by hypothesis. The other two terms in the decomposition are merely meaurable, but can be approximated to arbitrary accuracy by continuous functions. The upshot is that we can arrive at a decomposition

\displaystyle  1_{\tilde c^{-1}(i)} = P_{i,\varepsilon} + k_{i,\varepsilon,\varepsilon'} + g_{i,\varepsilon'}

(analogous to the arithmetic regularity lemma) for any {\varepsilon,\varepsilon'>0}, where {k_{i,\varepsilon,\varepsilon'}} is a bounded continuous function of {L^2} norm at most {\varepsilon}, and {g_{i,\varepsilon'}} is a bounded continuous function of {U^2} norm at most {\varepsilon'} (in practice we will take {\varepsilon'} much smaller than {\varepsilon}). Pulling back to {c}, we then have

\displaystyle  1_{c(h)=i} = P_{i,\varepsilon}(T^{h+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_M}a) + g_{i,\varepsilon'}(T^{h+h_M}a). \ \ \ \ \ (5)

Let {\varepsilon,\varepsilon'>0} be chosen later. The trigonometric polynomial {h \mapsto P_{i,\varepsilon}(T^{h} a)} is just a sum of {O_{\varepsilon,M_j}(1)} characters on {G}, so one can find a subgroup {H} of {G} of index {O_{p,\varepsilon,M_j}(1)} such that these polynomial are constant on each coset of {H} for all {i}. Then {h_*} lies in some coset {a_*+H} and {2h_*} lies in the coset {2a_*+H}. We then restrict {h} to also lie in {a_*+H}, and we will show that

\displaystyle  |\{ h \in \Phi_j \cap (a_*+H): c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \ \ \ \ \ (6)

\displaystyle  \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an exceptional event of proability {\kappa/2+o(1)}, which will establish our claim because {\varepsilon} will ultimately be chosen to dependon {p,\kappa,M_j}.

The left-hand side can be written as

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} 1_{c(h+h_*)=i'}.

The coupling of the constraints {c(h_*)=i} and {c(2h_*)=i'} is annoying (as {(h_*,2h_*)} is an “infinite complexity” pattern that cannot be controlled by any uniformity norm), but (perhaps surprisingly) will not end up causing an essential difficulty to the argument, as we shall see when we start eliminating the terms in this sum one at a time starting from the right.

We decompose the {1_{c(h+h_*)=i'}} term using (5):

\displaystyle  1_{c(h+h_*)=i'} = P_{i',\varepsilon}(T^{h+h_*+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a) + g_{i,\varepsilon'}(T^{h+h_*+h_M}a).

By Markov’s inequality, and removing an exceptional event of probabiilty at most {\kappa/100}, we may assume that the {g_{i',\varepsilon}} have normalized {L^2} norm {O_{\kappa,M_j}(\varepsilon)} on both of these cosets {a_*+H, 2a_*+H}. As such, the contribution of {k_{i',\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a)} to (6) become negligible if {\varepsilon} is small enough (depending on {\kappa,p,M_j}). From the near weak mixing of the {g_{i,\varepsilon'}}, we know that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |\langle T^h g_{i,\varepsilon'}, g_{i,\varepsilon'} \rangle| \ll_{p,\varepsilon,M_j} \varepsilon'

for all {i}, if we choose {\Phi_j} large enough. By genericity of {a}, this implies that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |{\bf E}_{l \in {\bf F}_p^{n_M}} g_{i,\varepsilon'}(T^{h+l+h_M} a) g_{i,\varepsilon'}(T^{l+h_M} a)| \ll_{p,\varepsilon,M_j} \varepsilon' + o(1).

From this and standard Cauchy-Schwarz (or van der Corput) arguments we can then show that the contribution of the {g_{i',\varepsilon'}(T^{h+h_*+h_M}a)} to (6) is negligible outside of an exceptional event of probability at most {\kappa/100+o(1)}, if {\varepsilon'} is small enough depending on {\kappa,p,M_j,\varepsilon}. Finally, the quantity {P_{i',\varepsilon}(T^{h+h_*+h_M} a)} is independent of {h}, and in fact is equal up to negligible error to the density of {c^{-1}(i')} in the coset {{\bf F}_p^{M_j}(2a_*+H)}. This density will be {\gg_{p,\kappa,M_j}} except for those {i'} which would have made a negligible impact on (6) in any event due to the rareness of the event {c(2h_*)=i'} in such cases. As such, to prove (6) it suffices to show that

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an event of probability {\kappa/100+o(1)}. Now one can sum in {i'} to simplify the above estiamte to

\displaystyle  \sum_{i} 1_{c(h_*)=i} (\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i}) / |\Phi_j \cap (a_*+H)| \gg_{\kappa,p,M_j} 1-o(1).

If {i} is such that {(\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i})/|\Phi_j \cap (a_*+H)|} is small compared with {p,\kappa,M_j}, then by genericity (and assuming {\Phi_j} large enough), the probability that {c(h_*)=i} will similarly be small (up to {o(1)} errors), and thus have a negligible influence on the above sum. As such, the above estimate simplifies to

\displaystyle  \sum_{i} 1_{c(h_*)=i} \gg_{\kappa,p,M_j} 1-o(1).

But the left-hand side sums to one, and the claim follows.

April 24, 2024

Scott Aaronson My Passover press release

FOR IMMEDIATE RELEASE – From the university campuses of Assyria to the thoroughfares of Ur to the palaces of the Hittite Empire, students across the Fertile Crescent have formed human chains, camel caravans, and even makeshift tent cities to protest the oppression of innocent Egyptians by the rogue proto-nation of “Israel” and its vengeful, warlike deity Yahweh. According to leading human rights organizations, the Hebrews, under the leadership of a bearded extremist known as Moses or “Genocide Moe,” have unleashed frogs, wild beasts, hail, locusts, cattle disease, and other prohibited collective punishments on Egypt’s civilian population, regardless of the humanitarian cost.

Human-rights expert Asenath Albanese says that “under international law, it is the Hebrews’ sole responsibility to supply food, water, and energy to the Egyptian populace, just as it was their responsibility to build mud-brick store-cities for Pharoah. Turning the entire Nile into blood, and plunging Egypt into neverending darkness, are manifestly inconsistent with the Israelites’ humanitarian obligations.”

Israelite propaganda materials have held these supernatural assaults to be justified by Pharoah’s alleged enslavement of the Hebrews, as well as unverified reports of his casting all newborn Hebrew boys into the Nile. Chanting “Let My People Go,” some Hebrew counterprotesters claim that Pharoah could end the plagues at any time by simply releasing those held in bondage.

Yet Ptahmose O’Connor, Chair of Middle East Studies at the University of Avaris, retorts that this simplistic formulation ignores the broader context. “Ever since Joseph became Pharoah’s economic adviser, the Israelites have enjoyed a position of unearned power and privilege in Egypt. Through underhanded dealings, they even recruited the world’s sole superpower—namely Adonai, Creator of the Universe—as their ally, removing any possibility that Adonai could serve as a neutral mediator in the conflict. As such, Egypt’s oppressed have a right to resist their oppression by any means necessary. This includes commonsense measures like setting taskmasters over the Hebrews to afflict them with heavy burdens, and dealing shrewdly with them lest they multiply.”

Professor O’Connor, however, dismissed the claims of drowned Hebrew babies as unverified rumors. “Infanticide accusations,” he explained, “have an ugly history of racism, Orientalism, and Egyptophobia. Therefore, unless you’re a racist or an Orientalist, the only possible conclusion is that no Hebrew babies have been drowned in the Nile, except possibly by accident, or of course by Hebrews themselves looking for a pretext to start this conflict.”

Meanwhile, at elite academic institutions across the region, the calls for justice have been deafening. “From the Nile to the Sea of Reeds, free Egypt from Jacob’s seeds!” students chanted. Some protesters even taunted passing Hebrew slaves with “go back to Canaan!”, though others were quick to disavow that message. According to Professor O’Connor, it’s important to clarify that the Hebrews don’t belong in Canaan either, and that finding a place where they do belong is not the protesters’ job.

In the face of such stridency, a few professors and temple priests have called the protests anti-Semitic. The protesters, however, dismiss that charge, pointing as proof to the many Hebrews and other Semitic peoples in their own ranks. For example, Sa-Hathor Goldstein, who currently serves as Pithom College’s Chapter President of Jews for Pharoah, told us that “we stand in solidarity with our Egyptian brethren, with the shepherds, goat-workers, and queer and mummified voices around the world. And every time Genocide Moe strikes down his staff to summon another of Yahweh’s barbaric plagues, we’ll be right there to tell him: Not In Our Name!”

“Look,” Goldstein added softly, “my own grandparents were murdered by Egyptian taskmasters. But the lesson I draw from my family’s tragic history is to speak up for oppressed people everywhere—even the ones who are standing over me with whips.”

“If Yahweh is so all-powerful,” Goldstein went on to ask, “why could He not devise a way to free the Israelites without a single Egyptian needing to suffer? Why did He allow us to become slaves in the first place? And why, after each plague, does He harden Pharoah’s heart against our release? Not only does that tactic needlessly prolong the suffering of Israelites and Egyptians alike, it also infringes on Pharoah’s bodily autonomy.”

But the strongest argument, Goldstein concluded, arching his eyebrow, is that “ever since I started speaking out on this issue, it’s been so easy to get with all the Midianite chicks at my school. That’s because they, like me, see past the endless intellectual arguments over ‘who started’ or ‘how’ or ‘why’ to the emotional truth that the suffering just has to stop, man.”

Last night, college towns across the Tigris, Euphrates, and Nile were aglow with candelight vigils for Baka Ahhotep, an Egyptian taskmaster and beloved father of three cruelly slain by “Genocide Moe,” in an altercation over alleged mistreatment of a Hebrew slave whose details remain disputed.

According to Caitlyn Mentuhotep, a sophomore majoring in hieroglyphic theory at the University of Pi-Ramesses who attended her school’s vigil for Ahhotep, staying true to her convictions hasn’t been easy in the face of Yahweh’s unending plagues—particularly the head lice. “But what keeps me going,” she said, “is the absolute certainty that, when people centuries from now write the story of our time, they’ll say that those of us who stood with Pharoah were on the right side of history.”

Have a wonderful holiday!

Terence TaoErratum for “An inverse theorem for the Gowers U^{s+1}[N]-norm”

The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers {U^{s+1}[N]}-norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.

Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree {s} filtration

\displaystyle  G = G_0 \geq G_1 \geq \dots \geq G_s \geq G_{s+1} = \{1\}

of a group {G}, which is a nested sequence of subgroups that obey the relation {[G_i,G_j] \leq G_{i+j}} for all {i,j}. The weaker notion (sometimes known as a prefiltration) permits the group {G_1} to be strictly smaller than {G_0}, while the stronger notion requires {G_0} and {G_1} to equal. In practice, one can often move between the two concepts, as {G_1} is always normal in {G_0}, and a prefiltration behaves like a filtration on every coset of {G_1} (after applying a translation and perhaps also a conjugation). However, we did not clarify this issue sufficiently in the paper, and there are some places in the text where results that were only proven for filtrations were applied for prefiltrations. The erratum fixes this issues, mostly by clarifying that we work with filtrations throughout (which requires some decomposition into cosets in places where prefiltrations are generated). Similar adjustments need to be made for multidegree filtrations and degree-rank filtrations, which we also use heavily on our paper.

In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group {G_s} was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter {h}).

Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.

April 23, 2024

n-Category Café Counting Points on Elliptic Curves (Part 3)

In Part 1 of this little series I showed you Wikipedia’s current definition of the LL-function of an elliptic curve, and you were supposed to shudder in horror. In this definition the LL-function is a product over all primes pp. But what do we multiply in this product? There are 4 different cases, each with its own weird and unmotivated formula!

In Part 2 we studied the 4 cases. They correspond to 4 things that can happen when we look at our elliptic curve over the finite field 𝔽 p\mathbb{F}_{p}: it can stay smooth, or it can become singular in 3 different ways. In each case we got a formula for number of points the resulting curve over the fields 𝔽 p k\mathbb{F}_{p^k}.

Now I’ll give a much better definition of the LL-function of an elliptic curve. Using our work from last time, I’ll show that it’s equivalent to the horrible definition on Wikipedia. And eventually I may get up the nerve to improve the Wikipedia definition. Then future generations will wonder what I was complaining about.

I want to explain the LL-function of an elliptic curve as simply as possible — thus, with a minimum of terminology and unmotivated nonsense.

The LL-function of an elliptic curve is a slight tweak of something more fundamental: its zeta function. So we have to start there.

The zeta function of an elliptic curve

You can define the zeta function of any gadget SS that assigns a finite set S(R)S(R) to any finite commutative ring RR. It goes like this:

ζ S(s)= n=1 |Z S(n)|n!n s \zeta_S(s) = \sum_{n = 1}^\infty \frac{|Z_S(n)|}{n!} n^{-s}

where ss is a complex number and the sum will converge if Re(s)Re(s) is big enough.

What’s Z S(n)Z_S(n)? A ring that’s a finite product of finite fields is called a finite semisimple commutative ring. An element of Z S(n)Z_S(n) is a way to make the set {1,,n}\{1, \dots, n\} into a finite semisimple commutative ring, say RR, and choose an element of S(R)S(R).

So, to define the zeta function of an elliptic curve, we just need a way for an elliptic curve EE to assign a finite set E(R)E(R) to any finite semisimple commutative ring RR. This is not hard. By an elliptic curve I simply mean an equation

y 2=P(x) y^2 = P(x)

where PP is a cubic equation with integer coefficients and distinct roots. When RR is a finite field, this equation will have a finite set of solutions in RR, and we take those and one extra ‘point at infinity’ to be the points of our set E(R)E(R). When RR is a general finite semsimple ring, it’s a product of finite fields, say

RF 1××F n R \cong F_1 \times \cdots \times F_n

and we define

E(R)=E(F 1)××E(F n) E(R) = E(F_1) \times \cdots \times E(F_n)

Then the zeta function of our elliptic curve EE is

ζ E(s)= n=1 |Z E(n)|n!n s \zeta_E(s) = \sum_{n = 1}^\infty \frac{|Z_E(n)|}{n!} n^{-s}

The L-function of an elliptic curve

Later today we will calculate the zeta function of an elliptic curve. And we’ll see that it always has a special form:

ζ E(s)=ζ(s)ζ(s1)somerationalfunctionofs \zeta_E(s) = \frac{ \zeta(s) \zeta(s - 1)}{some \; rational \; function \; of \; s}

where ζ\zeta is the Riemann zeta function. The denominator here is called the L-function of our elliptic curve, L(E,s)L(E,s). That’s all there is to it!

In short:

L(E,s)=ζ(s)ζ(s1)ζ E(s) L(E,s) = \frac{ \zeta(s) \zeta(s - 1)}{\zeta_E(s)}

You should think of the LL-function as the ‘interesting part’ of the zeta function of the elliptic curve — but flipped upside down, just to confuse amateurs. That’s also why we write n sn^{-s} in the formula for the zeta function instead of n sn^s: it’s a deliberately unnatural convention designed to keep out the riff-raff.

Arbitrary conventions aside, I hope you see the LL-function of an elliptic curve is a fairly simple thing. You might wonder why the zeta function is defined as it is, and why the zeta function of the elliptic curve has a factor of ζ(s)ζ(s1)\zeta(s) \zeta(s-1) in it. Those are very good questions, with good answers. But my point is this: all the gory complexity of the LL-function arises when we actually try to compute it more explicitly.

Now let’s do that.

The Euler product formula

An elliptic curve EE gives a finite set E(R)E(R) for each finite semisimple commutative ring RR. We need to count these sets to compute the zeta function or LL-function of our elliptic curve. But we have set things up so that

E(R×R)E(R)×E(R) E(R \times R') \cong E(R) \times E(R')

Since every finite semisimple commutative ring is a product of finite fields, this lets us focus on counting E(R)E(R) when RR is a finite field. And since every finite field has a prime power number of elements, we can tackle this counting problem ‘one prime at a time’.

If we carry this through, we get an interesting formula for the zeta function of an elliptic curve. In fact it’s a very general thing:

Euler Product Formula. Suppose SS is any functor from finite commutative rings to finite sets such that S(R×R)S(R)×S(R)S(R \times R') \cong S(R) \times S(R'). Then

ζ S(s)= pexp( k=1 |S(𝔽 p k)|kp ks) \zeta_S(s) = \prod_p \exp \left( \sum_{k = 1}^\infty \frac{|S(\mathbb{F}_{p^k})|}{k} p^{-k s} \right)

where we take the product over all primes pp, and 𝔽 p k\mathbb{F}_{p^k} is the field with p kp^k elements.

I wrote up a proof here:

so check it out if you want. I was not trying to make the argument look as simple as possible, but it’s really quite easy given what I’ve said: you can probably work it out yourself.

So: the zeta function of an elliptic curve EE is a product over primes. The factor for the prime pp is called the local zeta function

Z p(E,s)=exp( k=1 |E(𝔽 p k)|kp ks) Z_p(E,s) = \exp \left( \sum_{k = 1}^\infty \frac{|E(\mathbb{F}_{p^k})|}{k} p^{-k s} \right)

To compute this, we need to know the numbers |E(𝔽 p k)||E(\mathbb{F}_{p^k})|. Luckily we worked these out last time! But there are four cases.

In every case we have

|E(𝔽 p k)|=p k+1+c(p,k) |E(\mathbb{F}_{p^k})| = p^k + 1 + c(p,k)

where c(p,k)c(p,k) is some sort of ‘correction’. If the correction c(p,k)c(p,k) is zero, we get

Z p(E,s) = exp( k=1 p k+1kp ks) = exp(ln(1p s+1)ln(1p s)) = 1(1p s+1)(1p s) \begin{array}{ccl} Z_p(E,s) &=& \displaystyle{ \exp \left(\sum_{k = 1}^\infty \frac{p^k + 1}{k} p^{-k s} \right) } \\ \\ &=& \displaystyle{ \exp \left( -\ln(1 - p^{-s + 1}) - \ln(1 - p^{-s}) \right) } \\ \\ &=& \displaystyle{ \frac{1}{(1 - p^{-s + 1})(1 - p^{-s}) } } \end{array}

I did the sum pretty fast, but not because I’m good at sums — merely to keep you from getting bored. To do it yourself, all you need to know is the Taylor series for the logarithm.

To get the zeta function of our elliptic curve we multiply all the local zeta functions Z p(E,s)Z_p(E,s). So if all the corrections c(p,k)c(p,k) were zero, we’d get

Z(E,s)= p11p s+1 p11p s=ζ(s1)ζ(s) Z(E,s) = \prod_p \frac{1}{1 - p^{-s + 1}} \prod_p \frac{1}{1 - p^{-s} } = \zeta(s-1) \zeta(s)

Here I used the Euler product formula for the Riemann zeta function.

This is precisely why folks define the LL-function of an elliptic curve to be

L(E,s) 1=ζ E(s)ζ(s)ζ(s1) L(E,s)^{-1} = \frac{\zeta_E(s)}{ \zeta(s) \zeta(s - 1)}

It lets us focus on the effect of the corrections!. Well, it doesn’t explain that stupid reciprocal on the left-hand side, which is just a convention — but apart from that, we’re taking the zeta function of the elliptic curve and dividing out by what we’d get if all the corrections c(p,k)c(p,k) were zero. So, if you think about it a bit, we have

L(E,s) 1= pexp( k=1 c(p,k)kp ks) L(E,s)^{-1} = \prod_p \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s} \right)

It’s like the Euler product formula for the zeta function, but using only the corrections c(p,k)c(p,k) instead of the full count of points |E(𝔽 p k)||E(\mathbb{F}_{p^k})|.

As you can see, the LL-function is a product of local L-functions

L p(E,s) 1=exp( k=1 c(p,k)kp ks) L_p(E,s)^{-1} = \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s}\right)

So let’s work those out! There are four cases.

The local zeta function of an elliptic curve: additive reduction

If our elliptic curve gets a cusp over 𝔽 p\mathbb{F}_p, we say it has additive reduction. In this case we saw in Theorem 2 last time that

|E(𝔽 p k)|=p k+1 |E(\mathbb{F}_{p^k})| = p^k + 1

So in this case the correction vanishes:

c(p,k)=0 c(p,k) = 0

This makes the local LL-function very simple:

L p(E,s) 1=exp( k=1 c(p,k)kp ks)=1 L_p(E,s)^{-1} = \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s}\right) = 1

The local zeta function of an elliptic curve: split multiplicative reduction

If our elliptic curve gets a node over 𝔽 p\mathbb{F}_p and the two lines tangent to this node have slopes defined in 𝔽 p\mathbb{F}_p, we say our curve has split multiplicative reduction. In this case we saw in Theorem 3 last time that

|E(𝔽 p k)|=p k |E(\mathbb{F}_{p^k})| = p^k

So in this case, the correction is 1-1:

c(p,k)=1 c(p,k) = -1

This gives

L p(E,s) 1 = exp( k=1 1kp ks) = exp(ln(1p s)) = 1p s \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( \sum_{k = 1}^\infty \frac{1}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln(1 - p^{-s}) \right) } \\ \\ &=& 1 - p^{-s} \end{array}

Again I used my profound mastery of Taylor series of the logarithm to do the sum.

The local zeta function of an elliptic curve: split multiplicative reduction

If our elliptic curve gets a node over 𝔽 p\mathbb{F}_p and the two lines tangent to this node have slopes that are not defined in 𝔽 p\mathbb{F}_p, we say our curve has nonsplit multiplicative reduction. In this case we saw in Theorem 4 last time that

|E(𝔽 p k)|=p k+1(1) k |E(\mathbb{F}_{p^k})| = p^k + 1 - (-1)^k

In this case the correction is more interesting:

c(p,k)=(1) k c(p,k) = -(-1)^k

This gives

L p(E,s) 1 = exp( k=1 (1) kkp ks) = exp(ln(1+p s)) = 1+p s \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( -\sum_{k = 1}^\infty \frac{(-1)^k}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln(1 + p^{-s}) \right) } \\ \\ &=& 1 + p^{-s} \end{array}

Again, I just used the Taylor series of the log function.

The local zeta function of an elliptic curve: good reduction

If our elliptic curve stays smooth over 𝔽 p\mathbb{F}_p, we say it has good reduction. Ironically this gives the most complicated local zeta function. In Theorem 1 last time we saw

|E(𝔽 p k)|=p kα kα¯ k+1 |E(\mathbb{F}_{p^k})| = p^k - \alpha^k - \overline{\alpha}^k + 1

where α\alpha is a complex number with αα¯=p\alpha \overline{\alpha} = p. We didn’t prove this, we literally just saw it: it’s a fairly substantial result due to Hasse.

So, in this case the correction is

c(p,k)=α k+α¯ k c(p,k) = \alpha^k + \overline{\alpha}^k

This gives

L p(E,s) 1 = exp( k=1 α k+α¯ kkp ks) = exp(ln(1αp s)+ln(1α¯p s)) = (1αp s)(1α¯p s) \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( -\sum_{k = 1}^\infty \frac{\alpha^k + \overline{\alpha}^k}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln\left(1 - \alpha p^{-s}\right) \; + \; \ln\left(1 - \overline{\alpha} p^{-s}\right) \right) } \\ \\ &=& (1 - \alpha p^{-s})(1 - \overline{\alpha} p^{-s}) \end{array}

Again I just used the Taylor series of the log function. I’m sure glad I went to class that day.

But we can get a bit further using αα¯=p\alpha \overline{\alpha} = p:

L p(E,s) 1 = (1αp s)(1α¯p s) = 1(α+α¯)p s+p 12s \begin{array}{ccl} L_p(E,s)^{-1} &=& (1 - \alpha p^{-s})(1 - \overline{\alpha} p^{-s}) \\ &=& 1 - (\alpha + \overline{\alpha})p^{-s} + p^{1-2s} \end{array}

At this point people usually notice that

|E(𝔽 p)|=pαα¯+1 |E(\mathbb{F}_{p})| = p - \alpha - \overline{\alpha} + 1

so

α+α¯=p+1|E(𝔽 p)| \alpha + \overline{\alpha} = p + 1 - |E(\mathbb{F}_{p})|

Thus, you can compute this number using just the number of points of our curve over 𝔽 p\mathbb{F}_p. And to be cute, people call this number something like a p(E)a_p(E). So in the end, for elliptic curves of good reduction over the prime pp we have

L p(E,s) 1=1a p(E)p s+p 12s L_p(E,s)^{-1} = 1 - a_p(E) p^{-s} + p^{1-2s}

Whew, we’re done!

The L-function of an elliptic curve, revisited

Okay, now we can summarize all our work in an explicit formula for the LL-function of an elliptic curve.

Theorem. The LL-function of an elliptic curve EE equals

L(E,s)= pL p(E,s) 1 L(E,s) = \prod_p L_p(E,s)^{-1}

where:

1) L p(E,s)=1a p(E)p s+p 12sL_p(E,s) = 1 - a_p(E) p^{-s} + p^{1-2s} if EE remains smooth over 𝔽 p\mathbb{F}_p. Here a p(E)a_p(E) is p+1p + 1 minus the number of points of EE over 𝔽 p\mathbb{F}_p.

2) L p(E,s)=1L_p(E,s) = 1 if EE gets a cusp over 𝔽 p\mathbb{F}_p.

3) L p(E,s)=1p sL_p(E,s) = 1 - p^{-s} if EE gets a node over 𝔽 p\mathbb{F}_p, and the two tangent lines to this node have slopes that are defined in 𝔽 p\mathbb{F}_p.

4) L p(E,s)=1+p sL_p(E,s) = 1 + p^{-s} if EE we gets a node over 𝔽 p\mathbb{F}_p, but the two tangent lines to this node have slopes that are not defined in 𝔽 p\mathbb{F}_p.

My god! This is exactly what I showed you in Part 1. So this rather elaborate theorem is what some people run around calling the definition of the LL-function of an elliptic curve!

n-Category Café Moving On From Kent

Was it really seventeen years ago that John broke the news on this blog that I had finally landed a permanent academic job? That was a long wait – I’d had twelve years of temporary contracts after receiving my PhD.

And now it has been decided that I am to move on from the University of Kent. The University is struggling financially and has decreed that a number of programs, including Philosophy, are to be cut. Whatever the wisdom of their plan, my time here comes to an end this July.

What next? It’s a little early for me to retire. If anyone has suggestions, I’d be happy to hear them.

We started this blog just one year before I started at Kent. To help think things over, in the coming weeks I thought I’d revisit some themes developed here over the years to see how they panned out:

  1. Higher geometry: categorifying the Erlanger program
  2. Category theory meets machine learning
  3. Duality
  4. Categorifying logic
  5. Category theory applied to philosophy
  6. Rationality of (mathematical and scientific) theory change as understood through historical development

April 19, 2024

Scott Aaronson That IACR preprint

Update (April 19): Apparently a bug has been found, and the author has withdrawn the claim (see the comments).


For those who don’t yet know from their other social media: a week ago the cryptographer Yilei Chen posted a preprint, eprint.iacr.org/2024/555, claiming to give a polynomial-time quantum algorithm to solve lattice problems. For example, it claims to solve the GapSVP problem, which asks to approximate the length of the shortest nonzero vector in a given n-dimensional lattice, to within an approximation ratio of ~n4.5. The best approximation ratio previously known to be achievable in classical or quantum polynomial time was exponential in n.

If it’s correct, this is an extremely big deal. It doesn’t quite break the main lattice-based cryptosystems, but it would put those cryptosystems into a precarious position, vulnerable to a mere further polynomial improvement in the approximation factor. And, as we learned from the recent NIST competition, if the lattice-based and LWE-based systems were to fall, then we really don’t have many great candidates left for post-quantum public-key cryptography! On top of that, a full quantum break of LWE (which, again, Chen is not claiming) would lay waste (in a world with scalable QCs, of course) to a large fraction of the beautiful sandcastles that classical and quantum cryptographers have built up over the last couple decades—everything from Fully Homomorphic Encryption schemes, to Mahadev’s protocol for proving the output of any quantum computation to a classical skeptic.

So on the one hand, this would substantially enlarge the scope of exponential quantum speedups beyond what we knew a week ago: yet more reason to try to build scalable QCs! But on the other hand, it could also fuel an argument for coordinating to slow down the race to scalable fault-tolerant QCs, until the world can get its cryptographic house into better order. (Of course, as we’ve seen with the many proposals to slow down AI scaling, this might or might not be possible.)

So then, is the paper correct? I don’t know. It’s very obviously a serious effort by a serious researcher, a world away from the P=NP proofs that fill my inbox every day. But it might fail anyway. I’ve asked the world experts in quantum algorithms for lattice problems, and they’ve been looking at it, and none of them is ready yet to render a verdict. The central difficulty is that the algorithm is convoluted, and involves new tools that seem to come from left field, including complex Gaussian functions, the windowed quantum Fourier transform, and Karst waves (whatever those are). The algorithm has 9 phases by the author’s count. In my own perusal, I haven’t yet extracted even a high-level intuition—I can’t tell any little story like for Shor’s algorithm, e.g. “first you reduce factoring to period-finding, then you solve period-finding by applying a Fourier transform to a vector of amplitudes.”

So, the main purpose of this post is simply to throw things open to commenters! I’m happy to provide a public clearinghouse for questions and comments about the preprint, if those studying it would like that. You can even embed LaTeX in your comments, as will probably be needed to get anywhere.


Unrelated Update: Connor Tabarrok and his friends just put a podcast with me up on YouTube, in which they interview me in my office at UT Austin about watermarking of large language models and other AI safety measures.

Matt von HippelNo Unmoved Movers

Economists must find academics confusing.

When investors put money in a company, they have some control over what that company does. They vote to decide a board, and the board votes to hire a CEO. If the company isn’t doing what the investors want, the board can fire the CEO, or the investors can vote in a new board. Everybody is incentivized to do what the people who gave the money want to happen. And usually, those people want the company to increase its profits, since most of them people are companies with their own investors).

Academics are paid by universities and research centers, funded in the aggregate by governments and student tuition and endowments from donors. But individually, they’re also often funded by grants.

What grant-givers want is more ambiguous. The money comes in big lumps from governments and private foundations, which generally want something vague like “scientific progress”. The actual decision of who gets the money are made by committees made up of senior scientists. These people aren’t experts in every topic, so they have to extrapolate, much as investors have to guess whether a new company will be profitable based on past experience. At their best, they use their deep familiarity with scientific research to judge which projects are most likely to work, and which have the most interesting payoffs. At their weakest, though, they stick with ideas they’ve heard of, things they know work because they’ve seen them work before. That, in a nutshell, is why mainstream research prevails: not because the mainstream wants to suppress alternatives, but because sometimes the only way to guess if something will work is raw familiarity.

(What “works” means is another question. The cynical answers are “publishes papers” or “gets citations”, but that’s a bit unfair: in Europe and the US, most funders know that these numbers don’t tell the whole story. The trivial answer is “achieves what you said it would”, but that can’t be the whole story, because some goals are more pointless than others. You might want the answer to be “benefits humanity”, but that’s almost impossible to judge. So in the end the answer is “sounds like good science”, which is vulnerable to all the fads you can imagine…but is pretty much our only option, regardless.)

So are academics incentivized to do what the grant committees want? Sort of.

Science never goes according to plan. Grant committees are made up of scientists, so they know that. So while many grants have a review process afterwards to see whether you achieved what you planned, they aren’t all that picky about it. If you can tell a good story, you can explain why you moved away from your original proposal. You can say the original idea inspired a new direction, or that it became clear that a new approach was necessary. I’ve done this with an EU grant, and they were fine with it.

Looking at this, you might imagine that an academic who’s a half-capable storyteller could get away with anything they wanted. Propose a fashionable project, work on what you actually care about, and tell a good story afterwards to avoid getting in trouble. As long as you’re not literally embezzling the money (the guy who was paying himself rent out of his visitor funding, for instance), what could go wrong? You get the money without the incentives, you move the scientific world and nobody gets to move you.

It’s not quite that easy, though.

Sabine Hossenfelder told herself she could do something like this. She got grants for fashionable topics she thought were pointless, and told herself she’d spend time on the side on the things she felt were actually important. Eventually, she realized she wasn’t actually doing the important things: the faddish research ended up taking all her time. Not able to get grants doing what she actually cared about (and, in one of those weird temporary European positions that only lasts until you run out of grants), she now has to make a living from her science popularization work.

I can’t speak for Hossenfelder, but I’ve also put some thought into how to choose what to research, about whether I could actually be an unmoved mover. A few things get in the way:

First, applying for grants doesn’t just take storytelling skills, it takes scientific knowledge. Grant committees aren’t experts in everything, but they usually send grants to be reviewed by much more appropriate experts. These experts will check if your grant makes sense. In order to make the grant make sense, you have to know enough about the faddish topic to propose something reasonable. You have to keep up with the fad. You have to spend time reading papers, and talking to people in the faddish subfield. This takes work, but also changes your motivation. If you spend time around people excited by an idea, you’ll either get excited too, or be too drained by the dissonance to get any work done.

Second, you can’t change things that much. You still need a plausible story as to how you got from where you are to where you are going.

Third, you need to be a plausible person to do the work. If the committee looks at your CV and sees that you’ve never actually worked on the faddish topic, they’re more likely to give a grant to someone who’s actually worked on it.

Fourth, you have to choose what to do when you hire people. If you never hire any postdocs or students working on the faddish topic, then it will be very obvious that you aren’t trying to research it. If you do hire them, then you’ll be surrounded by people who actually care about the fad, and want your help to understand how to work with it.

Ultimately, to avoid the grant committee’s incentives, you need a golden tongue and a heart of stone, and even then you’ll need to spend some time working on something you think is pointless.

Even if you don’t apply for grants, even if you have a real permanent position or even tenure, you still feel some of these pressures. You’re still surrounded by people who care about particular things, by students and postdocs who need grants and jobs and fellow professors who are confident the mainstream is the right path forward. It takes a lot of strength, and sometimes cruelty, to avoid bowing to that.

So despite the ambiguous rules and lack of oversight, academics still respond to incentives: they can’t just do whatever they feel like. They aren’t bound by shareholders, they aren’t expected to make a profit. But ultimately, the things that do constrain them, expertise and cognitive load, social pressure and compassion for those they mentor, those can be even stronger.

I suspect that those pressures dominate the private sector as well. My guess is that for all that companies think of themselves as trying to maximize profits, the all-too-human motivations we share are more powerful than any corporate governance structure or org chart. But I don’t know yet. Likely, I’ll find out soon.

April 18, 2024

Tommaso DorigoOn Rating Universities

In a world where we live hostages of advertisement, where our email addresses and phone numbers are sold and bought by companies eager to intrude in our lives and command our actions, preferences, tastes; in a world where appearance trumps substance 10 to zero, where your knowledge and education are less valued than your looks, a world where truth is worth dimes and myths earn you millions - in this XXI century world, that is, Universities look increasingly out of place. 

read more

April 17, 2024

John BaezAgent-Based Models (Part 8)

Last time I presented a class of agent-based models where agents hop around a graph in a stochastic way. Each vertex of the graph is some ‘state’ agents can be in, and each edge is called a ‘transition’. In these models, the probability per time of an agent making a transition and leaving some state can depend on when it arrived at that state. It can also depend on which agents are in other states that are ‘linked’ to that edge—and when those agents arrived.

I’ve been trying to generalize this framework to handle processes where agents are born or die—or perhaps more generally, processes where some number of agents turn into some other number of agents. There’s already a framework that does something sort of like this. It’s called ‘stochastic Petri nets’, and we explained this framework here:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics, World Scientific Press, Singapore, 2018. (See also blog articles here.)

However, in their simplest form, stochastic Petri nets are designed for agents whose only distinguishing information is which state they’re in. They don’t have ‘names’—that is, individual identities. Thus, even calling them ‘agents’ is a bit of a stretch: usually they’re called ‘tokens’, since they’re drawn as black dots.

We could try to enhance the Petri net framework to give tokens names and other identifying features. There are various imaginable ways to do this, such as ‘colored Petri nets’. But so far this approach seems rather ill-adapted for processes where agents have identities—perhaps because I’m not thinking about the problem the right way.

So, at some point I decided to try something less ambitious. It turns out that in applications to epidemiology, general processes where n agents come in and m go out are not often required. So I’ve been trying to minimally enhance the framework from last time to include processes ‘birth’ and ‘death’ processes as well as transitions from state to state.

As I thought about this, some questions kept plaguing me:

When an agent gets created, or ‘born’, which one actually gets born? In other words, what is its name? Its precise name may not matter, but if we want to keep track of it after it’s born, we need to give it a name. And this name had better be ‘fresh’: not already the name of some other agent.

There’s also the question of what happens when an agent gets destroyed, or ‘dies’. This feels less difficult: there just stops being an agent with the given name. But probably we want to prevent a new agent from having the same name as that dead agent.

Both these questions seem fairly simple, but so far they’re making it hard for me to invent a truly elegant framework. At first I tried to separately describe transitions between states, births, and deaths. But this seemed to triplicate the amount of work I needed to do.

Then I tried models that have

• a finite set S of states,

• a finite set T of transitions,

• maps u, d \colon T \to S + \{\textrm{undefined}\} mapping each transition to its upstream and downstream states.

Here S + \{\textrm{undefined}\} is the disjoint union of S and a singleton whose one element is called undefined. Maps from T to S + \{\textrm{undefined}\} are a standard way to talk about partially defined maps from T to S. We get four cases:

1) If the downstream of a transition is defined (i.e. in S) but its upstream is undefined we call this transition a birth transition.

2) If the upstream of a transition is defined but its downstream is undefined we call this transition a death transition.

3) If the upstream and downstream of a transition are both defined we call this transition a transformation. In practice most of transitions will be of this sort.

4) We never need transitions whose upstream and downstream are undefined: these would describe agents that pop into existence and instantly disappear.

This is sort of nice, except for the fourth case. Unfortunately when I go ahead and try to actually describe a model based on this paradigm, I seem still to wind up needing to handle births, deaths and transformations quite differently.

For example, last time my models had a fixed set A of agents. To handle births and deaths, I wanted to make this set time-dependent. But I need to separately say how this works for transformations, birth transitions and death transitions. For transformations we don’t change A. For birth transitions we add a new element to A. And for death transitions we remove an element from A, and maybe record its name on a ledger or drive a stake through its heart to make sure it can never be born again!

So far this is tolerable, but things get worse. Our model also needs ‘links’ from states to transitions, to say how agents present in those states affect the timing of those transition. These are used in the ‘jump function’, a stochastic function that answers this question:

If at time t agent a arrives at the state upstream to some transition e, and the agents at states linked to the transition e form some set S_e, when will agent a make the transition e given that it doesn’t do anything else first?

This works fine for transformations, meaning transitions e that have both an upstream and downstream state. It works just a tiny bit differently for death transitions. But birth transitions are quite different: since newly born agents don’t have a previous upstream state u(e), they don’t have a time at which they arrived at that state.

Perhaps this is just how modeling works: perhaps the search for a staggeringly beautiful framework is a distraction. But another approach just occurred to me. Today I just want to briefly state it. I don’t want to write a full blog article on it yet, since I’ve already spent a lot of time writing two articles that I deleted when I became disgusted with them—and I might become disgusted with this approach too!

Briefly, this approach is exactly the approach I described last time. There are fundamentally no births and no deaths: all transitions have an upstream and a downstream state. There is a fixed set A of agents that does not change with time. We handle births and deaths using a dirty trick.

Namely, births are transitions out of a ‘unborn’ state. Agents hang around in this state until they are born.

Similarly, deaths are transitions to a ‘dead’ state.

There can be multiple ‘unborn’ states and ‘dead’ states. Having multiple unborn states makes it easy to have agents with different characteristics enter the model. Having multiple dead states makes it easy for us to keep tallies of different causes of death. We should make the unborn states distinct from the dead states to prevent ‘reincarnation’—that is, the birth of a new agent that happens to equal an agent that previously died.

I’m hoping that when we proceed this way, we can shoehorn birth and death processes into the framework described last time, without really needing to modify it at all! All we’re doing is exploiting it in a new way.

Here’s one possible problem: if we start with a finite number of agents in the ‘unborn’ states, the population of agents can’t grow indefinitely! But this doesn’t seem very dire. For most agent-based models we don’t feel a need to let the number of agents grow arbitrarily large. Or we can relax the requirement that the set of agents is finite, and put an infinite number of agents u_1, u_2, u_3, \dots in an unborn state. This can be done without using an infinite amount of memory: it’s a ‘potential infinity’ rather than an ‘actual infinity’.

There could be other problems. So I’ll post this now before I think of them.

April 16, 2024

John BaezAgent-Based Models (Part 7)

Last time I presented a simple, limited class of agent-based models where each agent independently hops around a graph. I wrote:

Today the probability for an agent to hop from one vertex of the graph to another by going along some edge will be determined the moment the agent arrives at that vertex. It will depend only on the agent and the various edges leaving that vertex. Later I’ll want this probability to depend on other things too—like whether other agents are at some vertex or other. When we do that, we’ll need to keep updating this probability as the other agents move around.

Let me try to figure out that generalization now.

Last time I discovered something surprising to me. To describe it, let’s bring in some jargon. The conditional probability per time of an agent making a transition from its current state to a chosen other state (given that it doesn’t make some other transition) is called the hazard function of that transition. In a Markov process, the hazard function is actually a constant, independent of how long the agent has been in its current state. In a semi-Markov process, the hazard function is a function only of how long the agent has been in its current state.

For example, people like to describe radioactive decay using a Markov process, since experimentally it doesn’t seem that ‘old’ radioactive atoms decay at a higher or lower rate than ‘young’ ones. (Quantum theory says this can’t be exactly true, but nobody has seen deviations yet.) On the other hand, the death rate of people is highly non-Markovian, but we might try to describe it using a semi-Markov process. Shortly after birth it’s high—that’s called ‘infant mortality’. Then it goes down, and then it gradually increases.

We definitely want to our agent-based processes to have the ability to describe semi-Markov processes. What surprised me last time is that I could do it without explicitly keeping track of how long the agent has been in its current state, or when it entered its current state!

The reason is that we can decide which state an agent will transition to next, and when, as soon as it enters its current state. This decision is random, of course. But using random number generators we can make this decision the moment the agent enters the given state—because there is nothing more to be learned by waiting! I described an algorithm for doing this.

I’m sure this is well-known, but I had fun rediscovering it.

But today I want to allow the hazard function for a given agent to make a given transition to depend on the states of other agents. In this case, if some other agent randomly changes state, we will need to recompute our agent’s hazard function. There is probably no computationally feasible way to avoid this, in general. In some analytically solvable models there might be—but we’re simulating systems precisely because we don’t know how to solve them analytically.

So now we’ll want to keep track of the residence time of each agent—that is, how long it’s been in its current state. But William Waites pointed out a clever way to do this: it’s cheaper to keep track of the agent’s arrival time, i.e. when it entered its current state. This way you don’t need to keep updating the residence time. Whenever you need to know the residence time, you can just subtract the arrival time from the current clock time.

Even more importantly, our model should now have ‘informational links’ from states to transitions. If we want the presence or absence of agents in some state to affect the hazard function of some transition, we should draw a ‘link’ from that state to that transition! Of course you could say that anything is allowed to affect anything else. But this would create an undisciplined mess where you can’t keep track of the chains of causation. So we want to see explicit ‘links’.

So, here’s my new modeling approach, which generalizes the one we saw last time. For starters, a model should have:

• a finite set V of vertices or states,

• a finite set E of edges or transitions,

• maps u, d \colon E \to V mapping each edge to its source and target, also called its upstream and downstream,

• finite set A of agents,

• a finite set L of links,

• maps s \colon L \to V and t \colon L \to E mapping each link to its source (a state) and its target (a transition).

All of this stuff, except for the set of agents, is exactly what we had in our earlier paper on stock-flow models, where we treated people en masse instead of as individual agents. You can see this in Section 2.1 here:

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood, Evan Patterson, Compositional modeling with stock and flow models.

So, I’m trying to copy that paradigm, and eventually unify the two paradigms as much as possible.

But they’re different! In particular, our agent-based models will need a ‘jump function’. This says when each agent a \in A will undergo a transition e \in E if it arrives at the state upstream to that transition at a specific time t \in \mathbb{R}. This jump function will not be deterministic: it will be a stochastic function, just as it was in yesterday’s formalism. But today it will depend on more things! Yesterday it depended only on a, e and t. But now the links will come into play.

For each transition e \in E, there is set of links whose target is that transition, namely

t^{-1}(e) = \{\ell \in L \; \vert \; t(\ell) = e \}

Each link in \ell \in  t^{-1}(e) will have one state v as its source. We say this state affects the transition e via the link \ell.

We want the jump function for the transition e to depend on the presence or absence of agents in each state that affects this transition.

Which agents are in a given state? Well, it depends! But those agents will always form some subset of A, and thus an element of 2^A. So, we want the jump function for the transition e to depend on an element of

\prod_{\ell \in t^{-1}(e)} 2^A = 2^{A \times t^{-1}(e)}

I’ll call this element S_e. And as mentioned earlier, the jump function will also depend on a choice of agent a \in A and on the arrival time of the agent a.

So, we’ll say there’s a jump function j_e for each transition e, which is a stochastic function

j_e \colon A \times 2^{A \times t^{-1}(e)} \times \mathbb{R} \rightsquigarrow \mathbb{R}

The idea, then, is that j_e(a, S_e, t) is the answer to this question:

If at time t agent a arrived at the vertex u(e), and the agents at states linked to the edge e are described by the set S_e, when will agent a move along the edge e to the vertex d(e), given that it doesn’t do anything else first?

The answer to this question can keep changing as agents other than a move around, since the set S_e can keep changing. This is the big difference between today’s formalism and yesterday’s.

Here’s how we run our model. At every moment in time we keep track of some information about each agent a \in A, namely:

• Which vertex is it at now? We call this vertex the agent’s state, \sigma(a).

• When did it arrive at this vertex? We call this time the agent’s arrival time, \alpha(a).

• For each edge e whose upstream is \sigma(a), when will agent a move along this edge if it doesn’t do anything else first? Call this time T(a,e).

I need to explain how we keep updating these pieces of information (supposing we already have them). Let’s assume that at some moment in time t_i an agent makes a transition. More specifically, suppose agent \underline{a} \in A makes a transition \underline{e} from the state

\underline{v} = u(\underline{e}) \in V

to the state

\underline{v}' = d(\underline{e}) \in V.

At this moment we update the following information:

1) We set

\alpha(\underline{a}) := t_i

(So, we update the arrival time of that agent.)

2) We set

\sigma(\underline{a}) := \underline{v}'

(So, we update the state of that agent.)

3) We recompute the subset of agents in the state \underline{v} (by removing \underline{a} from this subset) and in the state \underline{v}' (by adding \underline{a} to this subset).

4) For every transition f that’s affected by the state \underline{v} or the state \underline{v}', and for every agent a in the upstream state of that transition, we set

T(a,f) := j_f(a, S_f, \alpha(a))

where S_f is the element of 2^{A \times t^{-1}(f)} saying which subset of agents is in each state affecting the transition f. (So, we update our table of times at which agent a will make the transition f, given that it doesn’t do anything else first.)

Now we need to compute the next time at which something happens, namely t_{i+1}. And we need to compute what actually happens then!

To do this, we look through our table of times T(a,e) for each agent a and all transitions out of the state that agent is in. and see which time is smallest. If there’s a tie, break it. Then we reset \underline{a} and \underline{e} to be the agent-edge pair that minimizes T(a,e).

5) We set

t_{i+1} := T(\underline{a},\underline{e})

Then we loop back around to step 1), but with i+1 replacing i.

Whew! I hope you followed that. If not, please ask questions.

Doug NatelsonThe future of the semiconductor industry, + The Mechanical Universe

 Three items of interest:

  • This article is a nice review of present semiconductor memory technology.  The electron micrographs in Fig. 1 and the scaling history in Fig. 3 are impressive.
  • This article in IEEE Spectrum is a very interesting look at how some people think we will get to chips for AI applications that contain a trillion (\(10^{12}\)) transistors.  For perspective, the processor in my laptop used to write this has about 40 billion transistors.  (The article is nice, though the first figure commits the terrible sin of having no y-axis number or label; clearly it's supposed to represent exponential growth as a function of time in several different parameters.)
  • Caltech announced the passing of David Goodstein, renowned author of States of Matter and several books about the energy transition.  I'd written about my encounter with him, and I wanted to take this opportunity to pass along a working link to the youtube playlist for The Mechanical Universe.  While the animation can look a little dated, it's worth noting that when this was made in the 1980s, the CGI was cutting edge stuff that was presented at siggraph.

April 15, 2024

John PreskillHow I didn’t become a philosopher (but wound up presenting a named philosophy lecture anyway)

Many people ask why I became a theoretical physicist. The answer runs through philosophy—which I thought, for years, I’d left behind in college.

My formal relationship with philosophy originated with Mr. Bohrer. My high school classified him as a religion teacher, but he co-opted our junior-year religion course into a philosophy course. He introduced us to Plato’s cave, metaphysics, and the pursuit of the essence beneath the skin of appearance. The essence of reality overlaps with quantum theory and relativity, which fascinated him. Not that he understood them, he’d hasten to clarify. But he passed along that fascination to me. I’d always loved dealing in abstract ideas, so the notion of studying the nature of the universe attracted me. A friend and I joked about growing up to be philosophers and—on account of not being able to find jobs—living in cardboard boxes next to each other.

After graduating from high school, I searched for more of the same in Dartmouth College’s philosophy department. I began with two prerequisites for the philosophy major: Moral Philosophy and Informal Logic. I adored those courses, but I adored all my courses.

As a sophomore, I embarked upon Dartmouth’s philosophy-of-science course. I was one of the course’s youngest students, but the professor assured me that I’d accumulated enough background information in science and philosophy classes. Yet he and the older students threw around technical terms, such as qualia, that I’d never heard of. Those terms resurfaced in the assigned reading, again without definitions. I struggled to follow the conversation.

Meanwhile, I’d been cycling through the sciences. I’d taken my high school’s highest-level physics course, senior year—AP Physics C: Mechanics and Electromagnetism. So, upon enrolling in college, I made the rounds of biology, chemistry, and computer science. I cycled back to physics at the beginning of sophomore year, taking Modern Physics I in parallel with Informal Logic. The physics professor, Miles Blencowe, told me, “I want to see physics in your major.” I did, too, I assured him. But I wanted to see most subjects in my major.

Miles, together with department chair Jay Lawrence, helped me incorporate multiple subjects into a physics-centric program. The major, called “Physics Modified,” stood halfway between the physics major and the create-your-own major offered at some American liberal-arts colleges. The program began with heaps of prerequisite courses across multiple departments. Then, I chose upper-level physics courses, a math course, two history courses, and a philosophy course. I could scarcely believe that I’d planted myself in a physics department; although I’d loved physics since my first course in it, I loved all subjects, and nobody in my family did anything close to physics. But my major would provide a well-rounded view of the subject.

From shortly after I declared my Physics Modified major. Photo from outside the National Academy of Sciences headquarters in Washington, DC.

The major’s philosophy course was an independent study on quantum theory. In one project, I dissected the “EPR paper” published by Einstein, Podolsky, and Rosen (EPR) in 1935. It introduced the paradox that now underlies our understanding of entanglement. But who reads the EPR paper in physics courses nowadays? I appreciated having the space to grapple with the original text. Still, I wanted to understand the paper more deeply; the philosophy course pushed me toward upper-level physics classes.

What I thought of as my last chance at philosophy evaporated during my senior spring. I wanted to apply to graduate programs soon, but I hadn’t decided which subject to pursue. The philosophy and history of physics remained on the table. A history-of-physics course, taught by cosmologist Marcelo Gleiser, settled the matter. I worked my rear off in that course, and I learned loads—but I already knew some of the material from physics courses. Moreover, I knew the material more deeply than the level at which the course covered it. I couldn’t stand the thought of understanding the rest of physics only at this surface level. So I resolved to burrow into physics in graduate school. 

Appropriately, Marcelo published a book with a philosopher (and an astrophysicist) this March.

Burrow I did: after a stint in condensed-matter research, I submerged up to my eyeballs in quantum field theory and differential geometry at the Perimeter Scholars International master’s program. My research there bridged quantum information theory and quantum foundations. I appreciated the balance of fundamental thinking and possible applications to quantum-information-processing technologies. The rigorous mathematical style (lemma-theorem-corollary-lemma-theorem-corollary) appealed to my penchant for abstract thinking. Eating lunch with the Perimeter Institute’s quantum-foundations group, I felt at home.

Craving more research at the intersection of quantum thermodynamics and information theory, I enrolled at Caltech for my PhD. As I’d scarcely believed that I’d committed myself to my college’s physics department, I could scarcely believe that I was enrolling in a tech school. I was such a child of the liberal arts! But the liberal arts include the sciences, and I ended up wrapping Caltech’s hardcore vibe around myself like a favorite denim jacket.

Caltech kindled interests in condensed matter; atomic, molecular, and optical physics; and even high-energy physics. Theorists at Caltech thought not only abstractly, but also about physical platforms; so I started to, as well. I began collaborating with experimentalists as a postdoc, and I’m now working with as many labs as I can interface with at once. I’ve collaborated on experiments performed with superconducting qubits, photons, trapped ions, and jammed grains. Developing an abstract idea, then nursing it from mathematics to reality, satisfies me. I’m even trying to redirect quantum thermodynamics from foundational insights to practical applications.

At the University of Toronto in 2022, with my experimental collaborator Batuhan Yılmaz—and a real optics table!

So I did a double-take upon receiving an invitation to present a named lecture at the University of Pittsburgh Center for Philosophy of Science. Even I, despite not being a philosopher, had heard of the cache of Pitt’s philosophy-of-science program. Why on Earth had I received the invitation? I felt the same incredulity as when I’d handed my heart to Dartmouth’s physics department and then to a tech school. But now, instead of laughing at the image of myself as a physicist, I couldn’t see past it.

Why had I received that invitation? I did a triple-take. At Perimeter, I’d begun undertaking research on resource theories—simple, information-theoretic models for situations in which constraints restrict the operations one can perform. Hardly anyone worked on resource theories then, although they form a popular field now. Philosophers like them, and I’ve worked with multiple classes of resource theories by now.

More recently, I’ve worked with contextuality, a feature that distinguishes quantum theory from classical theories. And I’ve even coauthored papers about closed timelike curves (CTCs), hypothetical worldlines that travel backward in time. CTCs are consistent with general relativity, but we don’t know whether they exist in reality. Regardless, one can simulate CTCs, using entanglement. Collaborators and I applied CTC simulations to metrology—to protocols for measuring quantities precisely. So we kept a foot in practicality and a foot in foundations.

Perhaps the idea of presenting a named lecture on the philosophy of science wasn’t hopelessly bonkers. All right, then. I’d present it.

Presenting at the Center for Philosophy of Science

This March, I presented an ALS Lecture (an Annual Lecture Series Lecture, redundantly) entitled “Field notes on the second law of quantum thermodynamics from a quantum physicist.” Scientists formulated the second law the early 1800s. It helps us understand why time appears to flow in only one direction. I described three enhancements of that understanding, which have grown from quantum thermodynamics and nonequilibrium statistical mechanics: resource-theory results, fluctuation theorems, and thermodynamic applications of entanglement. I also enjoyed talking with Center faculty and graduate students during the afternoon and evening. Then—being a child of the liberal arts—I stayed in Pittsburgh for half the following Saturday to visit the Carnegie Museum of Art.

With a copy of a statue of the goddess Sekhmet. She lives in the Carnegie Museum of Natural History, which shares a building with the art museum, from which I detoured to see the natural-history museum’s ancient-Egypt area (as Quantum Frontiers regulars won’t be surprised to hear).

Don’t get me wrong: I’m a physicist, not a philosopher. I don’t have the training to undertake philosophy, and I have enough work to do in pursuit of my physics goals. But my high-school self would approve—that self is still me.

April 14, 2024

John BaezProtonium

It looks like they’ve found protonium in the decay of a heavy particle!

Protonium is made of a proton and an antiproton orbiting each other. It lasts a very short time before they annihilate each other.

It’s a bit like a hydrogen atom where the electron has been replaced with an antiproton! But it’s much smaller than a hydrogen atom. And unlike a hydrogen atom, which is held together by the electric force, protonium is mainly held together by the strong nuclear force.

There are various ways to make protonium. One is to make a bunch of antiprotons and mix them with protons. This was done accidentally in 2002. They only realized this upon carefully analyzing the data 4 years later.

This time, people were studying the decay of the J/psi particle. The J/psi is made of a heavy quark and its antiparticle. It’s 3.3 times as heavy as a proton, so it’s theoretically able to decay into protonium. And careful study showed that yes, it does this sometimes!

The new paper on this has a rather dry title—not “We found protonium!” But it has over 550 authors, which hints that it’s a big deal. I won’t list them.

• BESIII Collaboration, Observation of the anomalous shape of X(1840) in J/ψ→γ3(π+π−), Phys. Rev. Lett. 132 (2024), 151901.

The idea here is that sometimes the J/ψ particle decays into a gamma ray and 3 pion-antipion pairs. When they examined this decay, they found evidence that an intermediate step involved a particle of mass 1880 MeV/c², a bit more than an already known intermediate of mass 1840 MeV/c².

This new particle is a bit lighter than twice the mass of a proton, 938 MeV/c². So, there’s a good chance that it’s protonium!

But how did physicists made protonium by accident in 2002? They were trying to make antihydrogen, which is a positron orbiting an antiproton. To do this, they used the Antiproton Decelerator at CERN. This is just one of the many cool gadgets they keep near the Swiss-French border.

You see, to create antiprotons you need to smash particles at each other at almost the speed of light—so the antiprotons usually shoot out really fast. It takes serious cleverness to slow them down and catch them without letting them bump into matter and annihilate.

That’s what the Antiproton Decelerator does. So they created a bunch of antiprotons and slowed them down. Once they managed to do this, they caught the antiprotons in a Penning trap. This holds charged particles using magnetic and electric fields. Then they cooled the antiprotons—slowed them even more—by letting them interact with a cold gas of electrons. Then they mixed in some positrons. And they got antihydrogen!

But apparently some protons got in there too, so they also made some protonium, by accident. They only realized this when they carefully analyzed the data 4 years later, in a paper with only a few authors:

• N. Zurlo, M. Amoretti, C. Amsler, G. Bonomi, C. Carraro, C. L. Cesar, M. Charlton, M. Doser, A. Fontana, R. Funakoshi, P. Genova, R. S. Hayano, L. V. Jorgensen, A. Kellerbauer, V. Lagomarsino, R. Landua, E. Lodi Rizzini, M. Macri, N. Madsen, G. Manuzio, D. Mitchard, P. Montagna, L. G. Posada, H. Pruys, C. Regenfus, A. Rotondi, G. Testera, D. P. Van der Werf, A. Variola, L. Venturelli and Y. Yamazaki, Production of slow protonium in vacuum, Hyperfine Interactions 172 (2006), 97–105.

Protonium is sometimes called an ‘exotic atom’—though personally I’d consider it an exotic nucleus. The child in me thinks it’s really cool that there’s an abbreviation for protonium, Pn, just like a normal element.

John Preskill“Once Upon a Time”…with a twist

The Noncommuting-Charges World Tour (Part 1 of 4)

This is the first part in a four part series covering the recent Perspectives article on noncommuting charges. I’ll be posting one part every 6 weeks leading up to my PhD thesis defence.

Thermodynamics problems have surprisingly many similarities with fairy tales. For example, most of them begin with a familiar opening. In thermodynamics, the phrase “Consider an isolated box of particles” serves a similar purpose to “Once upon a time” in fairy tales—both serve as a gateway to their respective worlds. Additionally, both have been around for a long time. Thermodynamics emerged in the Victorian era to help us understand steam engines, while Beauty and the Beast and Rumpelstiltskin, for example, originated about 4000 years ago. Moreover, each conclude with important lessons. In thermodynamics, we learn hard truths such as the futility of defying the second law, while fairy tales often impart morals like the risks of accepting apples from strangers. The parallels go on; both feature archetypal characters—such as wise old men and fairy godmothers versus ideal gases and perfect insulators—and simplified models of complex ideas, like portraying clear moral dichotomies in narratives versus assuming non-interacting particles in scientific models.1

Of all the ways thermodynamic problems are like fairytale, one is most relevant to me: both have experienced modern reimagining. Sometimes, all you need is a little twist to liven things up. In thermodynamics, noncommuting conserved quantities, or charges, have added a twist.

Unfortunately, my favourite fairy tale, ‘The Hunchback of Notre-Dame,’ does not start with the classic opening line ‘Once upon a time.’ For a story that begins with this traditional phrase, ‘Cinderella’ is a great choice.

First, let me recap some of my favourite thermodynamic stories before I highlight the role that the noncommuting-charge twist plays. The first is the inevitability of the thermal state. For example, this means that, at most times, the state of most sufficiently small subsystem within the box will be close to a specific form (the thermal state).

The second is an apparent paradox that arises in quantum thermodynamics: How do the reversible processes inherent in quantum dynamics lead to irreversible phenomena such as thermalization? If you’ve been keeping up with Nicole Yunger Halpern‘s (my PhD co-advisor and fellow fan of fairytale) recent posts on the eigenstate thermalization hypothesis (ETH) (part 1 and part 2) you already know the answer. The expectation value of a quantum observable is often comprised of a sum of basis states with various phases. As time passes, these phases tend to experience destructive interference, leading to a stable expectation value over a longer period. This stable value tends to align with that of a thermal state’s. Thus, despite the apparent paradox, stationary dynamics in quantum systems are commonplace.

The third story is about how concentrations of one quantity can cause flows in another. Imagine a box of charged particles that’s initially outside of equilibrium such that there exists gradients in particle concentration and temperature across the box. The temperature gradient will cause a flow of heat (Fourier’s law) and charged particles (Seebeck effect) and the particle-concentration gradient will cause the same—a flow of particles (Fick’s law) and heat (Peltier effect). These movements are encompassed within Onsager’s theory of transport dynamics…if the gradients are very small. If you’re reading this post on your computer, the Peltier effect is likely at work for you right now by cooling your computer.

What do various derivations of the thermal state’s forms, the eigenstate thermalization hypothesis (ETH), and the Onsager coefficients have in common? Each concept is founded on the assumption that the system we’re studying contains charges that commute with each other (e.g. particle number, energy, and electric charge). It’s only recently that physicists have acknowledged that this assumption was even present.

This is important to note because not all charges commute. In fact, the noncommutation of charges leads to fundamental quantum phenomena, such as the Einstein–Podolsky–Rosen (EPR) paradox, uncertainty relations, and disturbances during measurement. This raises an intriguing question. How would the above mentioned stories change if we introduce the following twist?

“Consider an isolated box with charges that do not commute with one another.” 

This question is at the core of a burgeoning subfield that intersects quantum information, thermodynamics, and many-body physics. I had the pleasure of co-authoring a recent perspective article in Nature Reviews Physics that centres on this topic. Collaborating with me in this endeavour were three members of Nicole’s group: the avid mountain climber, Billy Braasch; the powerlifter, Aleksander Lasek; and Twesh Upadhyaya, known for his prowess in street basketball. Completing our authorship team were Nicole herself and Amir Kalev.

To give you a touchstone, let me present a simple example of a system with noncommuting charges. Imagine a chain of qubits, where each qubit interacts with its nearest and next-nearest neighbours, such as in the image below.

The figure is courtesy of the talented team at Nature. Two qubits form the system S of interest, and the rest form the environment E. A qubit’s three spin components, σa=x,y,z, form the local noncommuting charges. The dynamics locally transport and globally conserve the charges.

In this interaction, the qubits exchange quanta of spin angular momentum, forming what is known as a Heisenberg spin chain. This chain is characterized by three charges which are the total spin components in the x, y, and z directions, which I’ll refer to as Qx, Qy, and Qz, respectively. The Hamiltonian H conserves these charges, satisfying [H, Qa] = 0 for each a, and these three charges are non-commuting, [Qa, Qb] 0, for any pair a, b ∈ {x,y,z} where a≠b. It’s noteworthy that Hamiltonians can be constructed to transport various other kinds of noncommuting charges. I have discussed the procedure to do so in more detail here (to summarize that post: it essentially involves constructing a Koi pond).

This is the first in a series of blog posts where I will highlight key elements discussed in the perspective article. Motivated by requests from peers for a streamlined introduction to the subject, I’ve designed this series specifically for a target audience: graduate students in physics. Additionally, I’m gearing up to defending my PhD thesis on noncommuting-charge physics next semester and these blog posts will double as a fun way to prepare for that.

  1. This opening text was taken from the draft of my thesis. ↩

April 13, 2024

Doug NatelsonElectronic structure and a couple of fun links

Real life has been very busy recently.  Posting will hopefully pick up soon.  

One brief item.  Earlier this week, Rice hosted Gabi Kotliar for a distinguished lecture, and he gave a very nice, pedagogical talk about different approaches to electronic structure calculations.  When we teach undergraduate chemistry on the one hand and solid state physics on the other, we largely neglect electron-electron interactions (except for very particular issues, like Hund's Rules).  Trying to solve the many-electron problem fully is extremely difficult.  Often, approximating by solving the single-electron problem (e.g. finding the allowed single-electron states for a spatially periodic potential as in a crystal) and then "filling up"* those states gives decent results.   As we see in introductory courses, one can try different types of single-electron states.  We can start with atomic-like orbitals localized to each site, and end up doing tight binding / LCAO / Hückel (when applied to molecules).  Alternately, we can do the nearly-free electron approach and think about Bloch wavesDensity functional theory, discussed here, is more sophisticated but can struggle with situations when electron-electron interactions are strong.

One of Prof. Kotliar's big contributions is something called dynamical mean field theory, an approach to strongly interacting problems.  In a "mean field" theory, the idea is to reduce a many-particle interacting problem to an effective single-particle problem, where that single particle feels an interaction based on the averaged response of the other particles.  Arguably the most famous example is in models of magnetism.  We know how to write the energy of a spin \(\mathbf{s}_{i}\) in terms of its interactions \(J\) with other spins \(\mathbf{s}_{j}\) as \(\sum_{j} J \mathbf{s}_{i}\cdot \mathbf{s}_{j}\).  If there are \(z\) such neighbors that interact with spin \(i\), then we can try instead writing that energy as \(zJ \mathbf{s}_{i} \cdot \langle \mathbf{s}_{i}\rangle\), where the angle brackets signify the average.  From there, we can get a self-consistent equation for \(\langle \mathbf{s}_{i}\rangle\).  

Dynamical mean field theory is rather similar in spirit; there are non-perturbative ways to solve some strong-interaction "quantum impurity" problems.  DMFT is like a way of approximating a whole lattice of strongly interacting sites as a self-consistent quantum impurity problem for one site.  The solutions are not for wave functions but for the spectral function.  We still can't solve every strongly interacting problem, but Prof. Kotliar makes a good case that we have made real progress in how to think about many systems, and when the atomic details matter.

*Here, "filling up" means writing the many-electron wave function as a totally antisymmetric linear combination of single-electron states, including the spin states.

PS - two fun links:

April 12, 2024

Matt von HippelThe Hidden Higgs

Peter Higgs, the theoretical physicist whose name graces the Higgs boson, died this week.

Peter Higgs, after the Higgs boson discovery was confirmed

This post isn’t an obituary: you can find plenty of those online, and I don’t have anything special to say that others haven’t. Reading the obituaries, you’ll notice they summarize Higgs’s contribution in different ways. Higgs was one of the people who proposed what today is known as the Higgs mechanism, the principle by which most (perhaps all) elementary particles gain their mass. He wasn’t the only one: Robert Brout and François Englert proposed essentially the same idea in a paper that was published two months earlier, in August 1964. Two other teams came up with the idea slightly later than that: Gerald Guralnik, Carl Richard Hagen, and Tom Kibble were published one month after Higgs, while Alexander Migdal and Alexander Polyakov found the idea independently in 1965 but couldn’t get it published till 1966.

Higgs did, however, do something that Brout and Englert didn’t. His paper doesn’t just propose a mechanism, involving a field which gives particles mass. It also proposes a particle one could discover as a result. Read the more detailed obituaries, and you’ll discover that this particle was not in the original paper: Higgs’s paper was rejected at first, and he added the discussion of the particle to make it more interesting.

At this point, I bet some of you are wondering what the big deal was. You’ve heard me say that particles are ripples in quantum fields. So shouldn’t we expect every field to have a particle?

Tell that to the other three Higgs bosons.

Electromagnetism has one type of charge, with two signs: plus, and minus. There are electrons, with negative charge, and their anti-particles, positrons, with positive charge.

Quarks have three types of charge, called colors: red, green, and blue. Each of these also has two “signs”: red and anti-red, green and anti-green, and blue and anti-blue. So for each type of quark (like an up quark), there are six different versions: red, green, and blue, and anti-quarks with anti-red, anti-green, and anti-blue.

Diagram of the colors of quarks

When we talk about quarks, we say that the force under which they are charged, the strong nuclear force, is an “SU(3)” force. The “S” and “U” there are shorthand for mathematical properties that are a bit too complicated to explain here, but the “(3)” is quite simple: it means there are three colors.

The Higgs boson’s primary role is to make the weak nuclear force weak, by making the particles that carry it from place to place massive. (That way, it takes too much energy for them to go anywhere, a feeling I think we can all relate to.) The weak nuclear force is an “SU(2)” force. So there should be two “colors” of particles that interact with the weak nuclear force…which includes Higgs bosons. For each, there should also be an anti-color, just like the quarks had anti-red, anti-green, and anti-blue. So we need two “colors” of Higgs bosons, and two “anti-colors”, for a total of four!

But the Higgs boson discovered at the LHC was a neutral particle. It didn’t have any electric charge, or any color. There was only one, not four. So what happened to the other three Higgs bosons?

The real answer is subtle, one of those physics things that’s tricky to concisely explain. But a partial answer is that they’re indistinguishable from the W and Z bosons.

Normally, the fundamental forces have transverse waves, with two polarizations. Light can wiggle along its path back and forth, or up and down, but it can’t wiggle forward and backward. A fundamental force with massive particles is different, because they can have longitudinal waves: they have an extra direction in which they can wiggle. There are two W bosons (plus and minus) and one Z boson, and they all get one more polarization when they become massive due to the Higgs.

That’s three new ways the W and Z bosons can wiggle. That’s the same number as the number of Higgs bosons that went away, and that’s no coincidence. We physicist like to say that the W and Z bosons “ate” the extra Higgs, which is evocative but may sound mysterious. Instead, you can think of it as the two wiggles being secretly the same, mixing together in a way that makes them impossible to tell apart.

The “count”, of how many wiggles exist, stays the same. You start with four Higgs wiggles, and two wiggles each for the precursors of the W+, W-, and Z bosons, giving ten. You end up with one Higgs wiggle, and three wiggles each for the W+, W-, and Z bosons, which still adds up to ten. But which fields match with which wiggles, and thus which particles we can detect, changes. It takes some thought to look at the whole system and figure out, for each field, what kind of particle you might find.

Higgs did that work. And now, we call it the Higgs boson.

April 11, 2024

Scott Aaronson Avi Wigderson wins Turing Award!

Back in 2006, in the midst of an unusually stupid debate in the comment section of Lance Fortnow and Bill Gasarch’s blog, someone chimed in:

Since the point of theoretical computer science is solely to recognize who is the most badass theoretical computer scientist, I can only say:

GO HOME PUNKS!

WIGDERSON OWNS YOU!

Avi Wigderson: central unifying figure of theoretical computer science for decades; consummate generalist who’s contributed to pretty much every corner of the field; advocate and cheerleader for the field; postdoc adviser to a large fraction of all theoretical computer scientists, including both me and my wife Dana; derandomizer of BPP (provided E requires exponential-size circuits). Now, Avi not only “owns you,” he also owns a well-deserved Turing Award (on top of his well-deserved Nevanlinna, Abel, Gödel, and Knuth prizes). As Avi’s health has been a matter of concern to those close to him ever since his cancer treatment, which he blogged about a few years ago, I’m sure today’s news will do much to lift his spirits.

I first met Avi a quarter-century ago, when I was 19, at a PCMI summer school on computational complexity at the Institute for Advanced Study in Princeton. Then I was lucky enough to visit Avi in Israel when he was still a professor at the Hebrew University (and I was a grad student at Berkeley)—first briefly, but then Avi invited me back to spend a whole semester in Jerusalem, which ended up being one of my most productive semesters ever. Then Avi, having by then moved to the IAS in Princeton, hosted me for a one-year postdoc there, and later he and I collaborated closely on the algebrization paper. He’s had a greater influence on my career than all but a tiny number of people, and I’m far from the only one who can say that.

Summarizing Avi’s scientific contributions could easily fill a book, but Quanta and New Scientist and Lance’s blog can all get you started if you’re interested. Eight years ago, I took a stab at explaining one tiny little slice of Avi’s impact—namely, his decades-long obsession with “why the permanent is so much harder than the determinant”—in my IAS lecture Avi Wigderson’s “Permanent” Impact On Me, to which I refer you now (I can’t produce a new such lecture on one day’s notice!).

Huge congratulations to Avi.

Jordan EllenbergRoad trip to totality 2024

The last time we did this it was so magnificent that I said, on the spot, “see you again in 2024,” and seven years didn’t dim my wish to see the sun wink out again. It was easier this time — the path went through Indiana, which is a lot closer to home than St. Louis. More importantly, CJ can drive now, and likes to, so the trip is fully chauffeured. We saw the totality in Zionsville, IN, in a little park at the end of a residential cul-de-sac.

It was a smaller crowd than the one at Festus, MO in 2017; and unlike last time there weren’t a lot of travelers. These were just people who happened to live in Zionsville, IN and who were home in the middle of the day to see the eclipse. There were clouds, and a lot of worries about the clouds, but in the end it was just thin cirrus strips that blocked the sun, and then the non-sun, not at all.

To me it was a little less dramatic this time — because the crowd was more casual, because the temperature drop was less stark in April than it was in August, and of course because it was never again going to be the first time. But CJ and AB thought this one was better. We had very good corona. You could see a tiny red dot on the edge of the sun which was in fact a plasma prominence much bigger than the Earth.

Some notes:

  • We learned our lesson last time when we got caught in a massive traffic jam in the middle of a cornfield. We chose Zionsville because it was in the northern half of the totality, right on the highway, so we could be in the car zipping north on I-65 before the massive wave of northbound traffic out of Indianapolis caught up with us. And we were! Very satisfying, to watch on Google Maps as the traffic jam got longer and longer behind us, but was never quite where we were, as if we were depositing it behind us.
  • We had lunch in downtown Indianapolis where there is a giant Kurt Vonnegut Jr. painted on a wall. CJ is reading Slaughterhouse Five for school — in fact, to my annoyance, it’s the only full novel they’ve read in their American Lit elective. But it’s a pretty good choice for high school assigned reading. In the car I tried to explain Vonnegut’s theory of the granfaloon as it applied to “Hoosier” but neither kid was really interested.
  • We’ve done a fair number of road trips in the Mach-E and this was the first time charging created any annoyance. The Electrify America station we wanted on the way down had two chargers in use and the other two broken, so we had to detour quite a ways into downtown Lafayette to charge at a Cadillac dealership. On the way back, the station we planned on was full with one person waiting in line, so we had to change course and charge at the Whole Foods parking lot, and even there we got lucky as one person was leaving just as we arrived. The charging process probably added an hour to our trip each way.
  • While we charged at the Whole Foods in Schaumburg we hung out at the Woodfield Mall. Nostalgic feelings, for this suburban kid, to be in a thriving, functioning mall, with groups of kids just hanging out and vaguely shopping, the way we used to. The malls in Madison don’t really work like this any more. Is it a Chicago thing?
  • CJ is off to college next year. Sad to think there may not be any more roadtrips, or at least any more roadtrips where all of us are starting from home.
  • I was wondering whether total eclipses in the long run are equidistributed on the Earth’s surface and the answer is no: Ernie Wright at NASA made an image of the last 5000 years of eclipse paths superimposed:

There are more in the northern hemisphere than the southern because there are more eclipses in the summer (sun’s up longer!) and the sun is a little farther (whence visually a little smaller and more eclipsible) during northern hemisphere summer than southern hemisphere summer.

See you again in 2045!

April 09, 2024

Tommaso DorigoGoodbye Peter Higgs, And Thanks For The Boson

Peter Higgs passed away yesterday, at the age of 94. The scottish physicist, a winner of the 2013 Nobel Prize in Physics together with Francois Englert, hypothesized in 1964 the existence of the most mysterious elementary particle we know of, the Higgs boson, which was only discovered 48 years later by the ATLAS and CMS collaborations at the CERN Large Hadron Collider. 


read more

April 05, 2024

Terence TaoMarton’s conjecture in abelian groups with bounded torsion

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let {G = (G,+)} be an abelian {m}-torsion group (thus, {mx=0} for all {x \in G}), and let {A \subset G} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {(2K)^{O(m^3)}} translates of a subgroup {H} of {G} of cardinality at most {|A|}. Moreover, {H} is contained in {\ell A - \ell A} for some {\ell \ll (2 + m \log K)^{O(m^3 \log m)}}.

We had previously established the {m=2} case of this result, with the number of translates bounded by {(2K)^{12}} (which was subsequently improved to {(2K)^{11}} by Jyun-Jie Liao), but without the additional containment {H \subset \ell A - \ell A}. It remains a challenge to replace {\ell} by a bounded constant (such as {2}); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The {m=2} result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic {2} to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to {m}-dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let {G} be an abelian {m}-torsion group, and let {X_1,\dots,X_m} be independent finitely supported random variables on {G}, such that

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,

where {{\bf H}} denotes Shannon entropy. Then there is a uniform random variable {U_H} on a subgroup {H} of {G} such that

\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,

where {d} denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the {X_i} take values in some symmetric set {S}, then {H} lies in {\ell S} for some {\ell \ll (2 + \log K)^{O(m^3 \log m)}}.

As a first approximation, one should think of all the {X_i} as identically distributed, and having the uniform distribution on {A}, as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the {X_i} separately. It also is technically convenient to work with {m} independent variables, rather than just a pair of variables as we did in the {m=2} case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications {X'_1,\dots,X'_m} of {X_1,\dots,X_m} that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the {X_i} are all translates of a uniform random variable {U_H} on a subgroup {H}). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent {12}, but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the {m=2} case, albeit with somewhat worse exponents).
As before, we search for such improved random variables {X'_1,\dots,X'_m} by introducing more independent random variables – we end up taking an array of {m^2} random variables {Y_{i,j}} for {i,j=1,\dots,m}, with each {Y_{i,j}} a copy of {X_i}, and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums {\sum_j Y_{i,j}} of our array will end up being close to independent of the column sums {\sum_i Y_{i,j}}, subject to conditioning on the total sum {\sum_{i,j} Y_{i,j}}. Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if {X,Y} are two independent copies of a Gaussian random variable, then {X+Y} and {X-Y} are also independent of each other.
Up until now, the argument does not use the {m}-torsion hypothesis, nor the fact that we work with an {m \times m} array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum {\sum_{i,j} Y_{i,j}}). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable {U} of small entropic doubling that is reasonably close to all of the {X_i} in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

April 04, 2024

Tommaso DorigoSignificance Of Counting Experiments With Background Uncertainty

In the course of Statistics for Data Analysis I give every spring to PhD students in Physics I spend some time discussing the apparently trivial problem of evaluating the significance of an excess of observed events N over expected background B. 

This is a quite common setup in many searches in Physics and Astrophysics: you have some detection apparatus that records the number of phenomena of a specified kind, and you let it run for some time, whereafter you declare that you have observed N of them. If the occurrence of each phenomenon has equal probability and they do not influence one another, that number N is understood to be sampled from a Poisson distribution of mean B. 

read more

April 02, 2024

Terence TaoAI Mathematical Olympiad – Progress Prize Competition now open

The first progress prize competition for the AI Mathematical Olympiad has now launched. (Disclosure: I am on the advisory committee for the prize.) This is a competition in which contestants submit an AI model which, after the submissions deadline on June 27, will be tested (on a fixed computational resource, without internet access) on a set of 50 “private” test math problems, each of which has an answer as an integer between 0 and 999. Prior to the close of submission, the models can be tested on 50 “public” test math problems (where the results of the model are public, but not the problems themselves), as well as 10 training problems that are available to all contestants. As of this time of writing, the leaderboard shows that the best-performing model has solved 4 out of 50 of the questions (a standard benchmark, Gemma 7B, had previously solved 3 out of 50). A total of $2^{20} ($1.048 million) has been allocated for various prizes associated to this competition. More detailed rules can be found here.

Jordan EllenbergOrioles 13, Angels 4

I had the great privilege to be present at Camden Yards last weekend for what I believe to be the severest ass-whupping I have ever personally seen the Orioles administer. The Orioles went into the 6th winning 3-1 but the game felt like they were winning by more than that. Then suddenly they actually were — nine batters, nine runs, no outs (though in the middle of it all there was an easy double-play ball by Ramon Urias that the Angels’ shortstop Zach Neto just inexplicably dropped — it was that kind of day.) We had pitching (Grayson Rodriguez almost unhittable for six innings but for one mistake pitch), defense (Urias snagging a line drive at third almost before I saw it leave the bat) and of course a three-run homer, by Anthony Santander, to plate the 7th, 8th, and 9th of those nine runs.

Is being an Angels fan the saddest kind of fan to be right now? The Mets and the Padres, you have more of a “we spent all the money and built what should have been a superteam and didn’t win.” The A’s, you have the embarrassment of the on-field performance and the fact that your owner screwed your city and moved the team out of town. But the Angels? Somehow they just put together the two generational talents of this era of baseball and — didn’t do anything with them. There’s a certain heaviness to the sadness.

As good as the Orioles have been so far, taking three out of their first four and massively outscoring the opposition, I still think they weren’t really a 101-win team last year, and everything will have to go right again for them to be as good this year as they were last year. Our Felix Bautista replacement, Craig Kimbrel, has already blown his first and only save opportunity, which is to say he’s not really a Felix Bautista replacement. But it’s a hell of a team to watch.

The only downside — Gunnar Henderson, with a single, a triple and a home run already, is set to lead off the ninth but Hyde brings in Tony Kemp to pinch hit. Why? The fans want to see Gunnar on second for the cycle, let the fans see Gunnar on second for the cycle.

March 30, 2024

Andrew JaffeThe Milky Way

Doug NatelsonThoughts on undergrad solid-state content

Figuring out what to include in an undergraduate introduction to solid-state physics course is always a challenge.   Books like the present incarnation of Kittel are overstuffed with more content than can readily fit in a one-semester course, and because that book has grown organically from edition to edition, it's organizationally not the most pedagogical.  I'm a big fan of and have been teaching from my friend Steve Simon's Oxford Solid State Basics, which is great but a bit short for a (US) one-semester class.  Prof. Simon is interested in collecting opinions on what other topics would be good to include in a hypothetical second edition or second volume, and we thought that crowdsourcing it to this blog's readership could be fun.  As food for thought, some possibilities that occurred to me were:

  • A slightly longer discussion of field-effect transistors, since they're the basis for so much modern technology
  • A chapter or two on materials of reduced dimensionality (2D electron gas, 1D quantum wires, quantum point contacts, quantum dots; graphene and other 2D materials)
  • A discussion of fermiology (Shubnikov-DeHaas, DeHaas-van Alphen) - this is in Kittel, but it's difficult to explain in an accessible way
  • An introduction to the quantum Hall effect
  • Some mention of topology (anomalous velocity?  Berry connection?)
  • An intro to superconductivity (though without second quantization and the gap equation, this ends up being phenomenology)
  • Some discussion of Ginzburg-Landau treatment of phase transitions (though I tend to think of that as a topic for a statistical/thermal physics course)
  • An intro to Fermi liquid theory
  • Some additional discussion of electronic structure methods beyond the tight binding and nearly-free electron approaches in the present book (Wannier functions, an intro to density functional theory)
What do people think about this?

March 25, 2024

John PreskillMy experimental adventures in quantum thermodynamics

Imagine a billiard ball bouncing around on a pool table. High-school level physics enables us to predict its motion until the end of time using simple equations for energy and momentum conservation, as long as you know the initial conditions – how fast the ball is moving at launch, and in which direction.

What if you add a second ball? This makes things more complicated, but predicting the future state of this system would still be possible based on the same principles. What about if you had a thousand balls, or a million? Technically, you could still apply the same equations, but the problem would not be tractable in any practical sense.

Billiard balls bouncing around on a pool table are a good analogy for a many-body system like a gas of molecules. Image credit

Thermodynamics lets us make precise predictions about averaged (over all the particles) properties of complicated, many-body systems, like millions of billiard balls or atoms bouncing around, without needing to know the gory details. We can make these predictions by introducing the notion of probabilities. Even though the system is deterministic – we can in principle calculate the exact motion of every ball – there are so many balls in this system, that the properties of the whole will be very close to the average properties of the balls. If you throw a six-sided die, the result is in principle deterministic and predictable, based on the way you throw it, but it’s in practice completely random to you – it could be 1 through 6, equally likely. But you know that if you cast a thousand dice, the average will be close to 3.5 – the average of all possibilities. Statistical physics enables us to calculate a probability distribution over the energies of the balls, which tells us everything about the average properties of the system. And because of entropy – the tendency for the system to go from ordered to disordered configurations, even if the probability distribution of the initial system is far from the one statistical physics predicts, after the system is allowed to bounce around and settle, this final distribution will be extremely close to a generic distribution that depends on average properties only. We call this the thermal distribution, and the process of the system mixing and settling to one of the most likely configurations – thermalization.

For a practical example – instead of billiard balls, consider a gas of air molecules bouncing around. The average energy of this gas is proportional to its temperature, which we can calculate from the probability distribution of energies. Being able to predict the temperature of a gas is useful for practical things like weather forecasting, cooling your home efficiently, or building an engine. The important properties of the initial state we needed to know – energy and number of particles – are conserved during the evolution, and we call them “thermodynamic charges”. They don’t actually need to be electric charges, although it is a good example of something that’s conserved.

Let’s cross from the classical world – balls bouncing around – to the quantum one, which deals with elementary particles that can be entangled, or in a superposition. What changes when we introduce this complexity? Do systems even thermalize in the quantum world? Because of the above differences, we cannot in principle be sure that the mixing and settling of the system will happen just like in the classical cases of balls or gas molecules colliding.

A visualization of a complex pattern called a quantum scar that can develop in quantum systems. Image credit

It turns out that we can predict the thermal state of a quantum system using very similar principles and equations that let us do this in the classical case. Well, with one exception – what if we cannot simultaneously measure our critical quantities – the charges?

One of the quirks of quantum mechanics is that observing the state of the system can change it. Before the observation, the system might be in a quantum superposition of many states. After the observation, a definite classical value will be recorded on our instrument – we say that the system has collapsed to this state, and thus changed its state. There are certain observables that are mutually incompatible – we cannot know their values simultaneously, because observing one definite value collapses the system to a state in which the other observable is in a superposition. We call these observables noncommuting, because the order of observation matters – unlike in multiplication of numbers, which is a commuting operation you’re familiar with. 2 * 3 = 6, and also 3 * 2 = 6 – the order of multiplication doesn’t matter.

Electron spin is a common example that entails noncommutation. In a simplified picture, we can think of spin as an axis of rotation of our electron in 3D space. Note that the electron doesn’t actually rotate in space, but it is a useful analogy – the property is “spin” for a reason. We can measure the spin along the x-,y-, or z-axis of a 3D coordinate system and obtain a definite positive or negative value, but this observation will result in a complete loss of information about spin in the other two perpendicular directions.

An illustration of electron spin. We can imagine it as an axis in 3D space that points in a particular direction. Image from Wikimedia Commons.

If we investigate a system that conserves the three spin components independently, we will be in a situation where the three conserved charges do not commute. We call them “non-Abelian” charges, because they enjoy a non-Abelian, that is, noncommuting, algebra. Will such a system thermalize, and if so, to what kind of final state?

This is precisely what we set out to investigate. Noncommutation of charges breaks usual derivations of the thermal state, but researchers have managed to show that with non-Abelian charges, a subtly different non-Abelian thermal state (NATS) should emerge. Myself and Nicole Yunger Halpern at the Joint Center for Quantum Information and Computer Science (QuICS) at the University of Maryland have collaborated with Amir Kalev from the Information Sciences Institute (ISI) at the University of Southern California, and experimentalists from the University of Innsbruck (Florian Kranzl, Manoj Joshi, Rainer Blatt and Christian Roos) to observe thermalization in a non-Abelian system – and we’ve recently published this work in PRX Quantum .

The experimentalists used a device that can trap ions with electric fields, as well as manipulate and read out their states using lasers. Only select energy levels of these ions are used, which effectively makes them behave like electrons. The laser field can couple the ions in a way that approximates the Heisenberg Hamiltonian – an interaction that conserves the three total spin components individually. We thus construct the quantum system we want to study – multiple particles coupled with interactions that conserve noncommuting charges.

We conceptually divide the ions into a system of interest and an environment. The system of interest, which consists of two particles, is what we want to measure and compare to theoretical predictions. Meanwhile, the other ions act as the effective environment for our pair of ions – the environment ions interact with the pair in a way that simulates a large bath exchanging heat and spin.

Photo of our University of Maryland group. From left to right: Twesh Upadhyaya, Billy Braasch, Shayan Majidy, Nicole Yunger Halpern, Aleks Lasek, Jose Antonio Guzman, Anthony Munson.

If we start this total system in some initial state, and let it evolve under our engineered interaction for a long enough time, we can then measure the final state of the system of interest. To make the NATS distinguishable from the usual thermal state, I designed an initial state that is easy to prepare, and has the ions pointing in directions that result in high charge averages and relatively low temperature. High charge averages make the noncommuting nature of the charges more pronounced, and low temperature makes the state easy to distinguish from the thermal background. However, we also show that our experiment works for a variety of more-arbitrary states.

We let the system evolve from this initial state for as long as possible given experimental limitations, which was 15 ms. The experimentalists then used quantum state tomography to reconstruct the state of the system of interest. Quantum state tomography makes multiple measurements over many experimental runs to approximate the average quantum state of the system measured. We then check how close the measured state is to the NATS. We have found that it’s about as close as one can expect in this experiment!

And we know this because we have also implemented a different coupling scheme, one that doesn’t have non-Abelian charges. The expected thermal state in the latter case was reached within a distance that’s a little smaller than our non-Abelian case. This tells us that the NATS is almost reached in our experiment, and so it is a good, and the best known, thermal state for the non-Abelian system – we have compared it to competitor thermal states.

Working with the experimentalists directly has been a new experience for me. While I was focused on the theory and analyzing the tomography results they obtained, they needed to figure out practical ways to realize what we asked of them. I feel like each group has learned a lot about the tasks of the other. I have become well acquainted with the trapped ion experiment and its capabilities and limitation. Overall, it has been great collaborating with the Austrian group.

Our result is exciting, as it’s the first experimental observation within the field of non-Abelian thermodynamics! This result was observed in a realistic, non-fine-tuned system that experiences non-negligible errors due to noise. So the system does thermalize after all. We have also demonstrated that the trapped ion experiment of our Austrian friends can be used to simulate interesting many-body quantum systems. With different settings and programming, other types of couplings can be simulated in different types of experiments.

The experiment also opened avenues for future work. The distance to the NATS was greater than the analogous distance to the Abelian system. This suggests that thermalization is inhibited by the noncommutation of charges, but more evidence is needed to justify this claim. In fact, our other recent paper in Physical Review B suggests the opposite!

As noncommutation is one of the core features that distinguishes classical and quantum physics, it is of great interest to unravel the fine differences non-Abelian charges can cause. But we also hope that this research can have practical uses. If thermalization is disrupted by noncommutation of charges, engineered systems featuring them could possibly be used to build quantum memory that is more robust, or maybe even reduce noise in quantum computers. We continue to explore noncommutation, looking for interesting effects that we can pin on it. I am currently working on verifying the workings of a hypothesis that explains when and why quantum systems thermalize internally.

March 16, 2024

David Hoggsubmitted!

OMG I actually just submitted an actual paper, with me as first author. I submitted to the AAS Journals, with a preference for The Astronomical Journal. I don't write all that many first-author papers, so I am stoked about this. If you want to read it: It should come out on arXiv within days, or if you want to type pdflatex a few times, it is available at this GitHub repo. It is about how to combine many shifted images into one combined, mean image.

David HoggIAIFI Symposium, day two

Today was day two of a meeting on generative AI in physics, hosted by MIT. My favorite talks today were by Song Han (MIT) and Thea Aarestad (ETH), both of whom are working on making ML systems run ultra-fast on extremely limited hardware. Themes were: Work at low precision. Even 4-bit number representations! Radical. And bandwidth is way more expensive than compute: Never move data, latents, or weights to new hardware; work as locally as you can. They both showed amazing performance on terrible, tiny hardware. In addition, Han makes really cute 3d-printed devices! A conversation at the end that didn't quite happen is about how Aarestad's work might benefit from equivariant methods: Her application area is triggers in the CMS device at the LHC; her symmetry group is the Lorentz group (and permutations and etc). The day started with me on a panel in which my co-panelists said absolutely unhhinged things about the future of physics and artificial intelligence. I learned that many people think we are only years away from having independently operating, fully functional aritificial physicists that are more capable than we are.

David HoggIAIFI Symposium, day one

Today was the first day of a two-day symposium on the impact of Generative AI in physics. It is hosted by IAIFI and A3D3, two interdisciplinary and inter-institutional entities working on things related to machine learning. I really enjoyed the content today. One example was Anna Scaife (Manchester) telling us that all the different methods they have used for uncertainty quantification in astronomy-meets-ML contexts give different and inconsistent answers. It is very hard to know your uncertainty when you are doing ML. Another example was Simon Batzner (DeepMind) explaining that equivariant methods were absolutely required for the materials-design projects at DeepMind, and that introducing the equivariance absolutely did not bork optimization (as many believe it will). Those materials-design projects have been ridiculously successful. He said the amusing thing “Machine learning is IID, science is OOD”. I couldn't agree more. In a panel at the end of the day I learned that learned ML controllers now beat hand-built controllers in some robotics applications. That's interesting and surprising.

March 12, 2024

David Hoggblack holes as the dark matter

Today Cameron Norton (NYU) gave a great brown-bag talk on the possibility that the dark matter might be asteroid-mass-scale black holes. This is allowed by all constraints at present: If the masses are much smaller, the black holes evaporate or emit observably. If the black holes are much smaller, they would create observable microlensing or dynamical signatures.

She and Kleban (NYU) are working on methods for creating such black holes primordially, by modifying hte potential at inflation, creating opportunities for bubble nucleations in inflation that would subsequently collapse into small black holes after the Universe exits inflation. It's speculative obviously, but not ruled out at present!

An argument broke out during and after the talk whether you would be injured if you were intersected by a 1020 g black hole! My position is that you would be totally fine! Everyone else in the room disagreed with me, for many different reasons. Time to get calculating.

Another great idea: Could we find stars that have captured low-mass black holes by looking for the radial-velocity signal? I got really interested in this one at the end.

David HoggThe Cannon and El Cañon

At the end of the day I got a bit of quality time in with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), who have just (actually just before I met with them) created a new implementation of The Cannon (the data-driven model of stellar photospheres originally created by Melissa Ness and me back in 2014/2015). Why!? Not because the world needs another implementation. We are building a new implementation because we plan to extend out to El Cañon, which will extend the probabilistic model into the label domain: It will properly generate or treat noisy and missing labels. That will permit us to learn latent labels, and de-noise noisy labels.

February 19, 2024

Mark GoodsellRencontres de Physique des Particules 2024

Just over a week ago the annual meeting of theoretical particle physicists (RPP 2024) was held at Jussieu, the campus of Sorbonne University where I work. I wrote about the 2020 edition (held just outside Paris) here; in keeping with tradition, this year's version also contained similar political sessions with the heads of the CNRS' relevant physics institutes and members of CNRS committees, although they were perhaps less spicy (despite rumours of big changes in the air). 

One of the roles of these meetings is as a shop window for young researchers looking to be hired in France, and a great way to demonstrate that they are interested and have a connection to the system. Of course, this isn't and shouldn't be obligatory by any means; I wasn't really aware of this prior to entering the CNRS though I had many connections to the country. But that sort of thing seems especially important after the problems described by 4gravitons recently, and his post about getting a permanent job in France -- being able to settle in a country is non-trivial, it's a big worry for both future employers and often not enough for candidates fighting tooth and nail for the few jobs there are. There was another recent case of someone getting a (CNRS) job -- to come to my lab, even -- who much more quickly decided to leave the entire field for personal reasons. Both these stories saddened me. I can understand -- there is the well-known Paris syndrome for one thing -- and the current political anxiety about immigration and the government's response to the rise of the far right (across the world), coupled with Brexit, is clearly leading to things getting harder for many. These stories are especially worrying because we expect to be recruiting for university positions in my lab this year.

I was obviously very lucky and my experience was vastly different; I love both the job and the place, and I'm proud to be a naturalised citizen. Permanent jobs in the CNRS are amazing, especially in terms of the time and freedom you have, and there are all sorts of connections between the groups throughout the country such as via the IRN Terascale or GdR Intensity Frontier; or IRN Quantum Fields and Strings and French Strings meetings for more formal topics. I'd recommend anyone thinking about working here to check out these meetings and the communities built around them, as well as taking the opportunity to find out about life here. For those moving with family, France also offers a lot of support (healthcare, childcare, very generous holidays, etc) once you have got into the system.

The other thing to add that was emphasised in the political sessions at the RPP (reinforcing the message that we're hearing a lot) is that the CNRS is very keen to encourage people from under-represented groups to apply and be hired. One of the ways they see to help this is to put pressure on the committees to hire researchers (even) earlier after their PhD, in order to reduce the length of the leaky pipeline.

Back to physics

Coming back to the RPP, this year was particularly well attended and had an excellent program of reviews of hot topics, invited and contributed talks, put together very carefully by my colleagues. It was particularly poignant for me because two former students in my lab who I worked with a lot, one who recently got a permanent job, were talking; and in addition both a former student of mine and his current PhD student were giving talks: this made me feel old. (All these talks were fascinating, of course!) 

One review that stood out as relevant for this blog was Bogdan Malaescu's review of progress in understanding the problem with muon g-2. As I discussed here, there is currently a lot of confusion in what the Standard Model prediction should be for that quantity. This is obviously very concerning for the experiments measuring muon g-2, who in a paper last year reduced their uncertainty by a factor of 2 to $$a_\mu (\mathrm{exp}) = 116 592 059(22)\times 10^{−11}. $$

The Lattice calculation (which has been confirmed now by several groups) disagrees with the prediction using the data-driven R-ratio method however, and there is a race on to understand why. New data from the CMD-3 experiment seems to agree with the lattice result, combining all global data on measurements of \(e^+ e^- \rightarrow \pi^+ \pi^- \) still gives a discrepancy of more than \(5\sigma\). There is clearly a significant disagreement within the data samples used (indeed, CMD-3 significantly disagrees with their own previous measurement, CMD-2). The confusion is summarised by this plot:

As can be seen, the finger of blame is often pointed at the KLOE data; excluding it but including the others in the plot gives agreement with the lattice result and a significance of non-zero \(\Delta a_\mu\) compared to experiment of \(2.8\sigma\) (or for just the dispersive method without the lattice data \( \Delta a_\mu \equiv a_\mu^{\rm SM} - a_\mu^{\rm exp} = −123 \pm 33 \pm 29 \pm 22 \times 10^{-11} \) , a discrepancy of \(2.5\sigma\)). In Bogdan's talk (see also his recent paper) he discusses these tensions and also the tensions between the data and the evaluation of \(a_\mu^{\rm win}\), which is the contribution coming from a narrow "window" (when the total contribution to the Hadronic Vacuum Polarisation is split into short, medium and long-distance pieces, the medium-range part should be the one most reliable for lattice calculations -- at short distances the lattice spacing may be too small, and at long ones the lattice may not be large enough). There he shows that, if we exclude the KLOE data and just include the BABAR, CMD-3 and Tau data, while the overall result agrees with the BMW lattice result, the window one disagrees by \(2.9 \sigma\) [thanks Bogdan for the correction to the original post]. It's clear that there is still a lot to be understood in the discrepancies of the data, and perhaps, with the added experimental precision on muon g-2, there is even still a hint of new physics ...

February 13, 2024

Jordan EllenbergAlphabetical Diaries

Enough of this.Enough.Equivocal or vague principles, as a rule, will make your life an uninspired, undirected, and meaningless act.

This is taken from Alphabetical Diaries, a remarkable book I am reading by Sheila Heti, composed of many thousands of sentences drawn from her decades of diaries and presented in alphabetical order. It starts like this:

A book about how difficult it is to change, why we don’t want to, and what is going on in our brain.A book can be about more than one thing, like a kaleidoscope, it can have man things that coalesce into one thing, different strands of a story, the attempt to do several, many, more than one thing at a time, since a book is kept together by the binding.A book like a shopping mart, all the selections.A book that does only one thing, one thing at a time.A book that even the hardest of men would read.A book that is a game.A budget will help you know where to go.

How does a simple, one might even say cheap, technique, one might even say gimmick, work so well? I thrill to the aphorisms even when I don’t believe them, as with the aphorism above: principles must be equivocal or at least vague to work as principles; without the necessary vagueness they are axioms, which are not good for making one’s life a meaningful act, only good for arguing on the Internet. I was reading Alphabetical Diaries while I walked home along the southwest bike path. I stopped for a minute and went up a muddy slope into the cemetery where there was a gap in the fence, and it turned out this gap opened on the area of infant graves, graves about the size of a book, graves overlaying people who were born and then did what they did for a week and then died — enough of this.

January 24, 2024

Robert HellingHow do magnets work?

I came across this excerpt from a a christian home schooling book:

which is of course funny in so many ways not at least as the whole process of "seeing" is electromagnetic at its very core and of course most people will have felt electricity at some point in their life. Even historically, this is pretty much how it was discovered by Galvani (using forge' legs) at a time when electricity was about cat skins and amber.

It also brings to mind this quite famous Youtube video that shows Feynman being interviewed by the BBC and first getting somewhat angry about the question how magnets work and then actually goes into a quite deep explanation of what it means to explain something
 

But how do magnets work? When I look at what my kids are taught in school, it basically boils down to "a magnet is made up of tiny magnets that all align" which if you think about it is actually a non-explanation. Can we do better (using more than layman's physics)? What is it exactly that makes magnets behave like magnets?

I would define magnetism as the force that moving charges feel in an electromagnetic field (the part proportional to the velocity) or said the other way round: The magnetic field is the field that is caused by moving charges. Using this definition, my interpretation of the question about magnets is then why permanent magnets feel this force.  For the permanent magnets, I want to use the "they are made of tiny magnets" line of thought but remove the circularity of the argument by replacing it by "they are made of tiny spins". 

This transforms the question to "Why do the elementary particles that make up matter feel the same force as moving charges even if they are not moving?".

And this question has an answer: Because they are Dirac particles! At small energies, the Dirac equation reduces to the Pauli equation which involves the term (thanks to minimal coupling)
$$(\vec\sigma\cdot(\vec p+q\vec A)^2$$
and when you expand the square that contains (in Coulomb gauge)
$$(\vec\sigma\cdot \vec p)(\vec\sigma\cdot q\vec A)= q\vec A\cdot\vec p + (\vec p\times q\vec A)\cdot\vec\sigma$$
Here, the first term is the one responsible for the interaction of the magnetic field and moving charges while the second one couples $$\nabla\times\vec A$$ to the operator $$\vec\sigma$$, i.e. the spin. And since you need to have both terms, this links the force on moving charges to this property we call spin. If you like, the fact that the g-factor is not vanishing is the core of the explanation how magnets work.

And if you want, you can add spin-statistics which then implies the full "stability of matter" story in the end is responsible that you can from macroscopic objects out of Dirac particles that can be magnets.


January 20, 2024

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …

Update:

Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

December 20, 2023

Richard EastherA Bigger Sky

Amongst everything else that happened in 2023, a key anniversary of a huge leap in our understanding of the Universe passed largely unnoticed – the centenary of the realisation that not only was our Sun one of many stars in the Milky Way galaxy but that our galaxy was one of many galaxies in the Universe.

I had been watching the approaching anniversary for over a decade, thanks to teaching the cosmology section of the introductory astronomy course at the University of Auckland. My lectures come at the end of the semester and each October finds me showing this image – with its “October 1923” inscription – to a roomful of students.

The image was captured by the astronomer Edwin Hubble, using the world’s then-largest telescope, on top of Mt Wilson, outside Los Angeles. At first glance, it may not even look like a picture of the night sky: raw photographic images are reversed, so stars show up as dark spots against a light backgrounds. However, this odd-looking picture changed our sense of where we live in the Universe.

My usual approach when I share this image with my students is to ask for a show of hands by people with a living relative born before 1923. It’s a decent-sized class and this year a few of them had a centenarian in the family. However, I would get far more hands a decade ago when I asked about mere 90 year olds. And sometime soon no hands will rise at this prompt and I will have to come up with a new shtick. But it is remarkable to me that there are people alive today who were born before we understood of the overall arrangement of the Universe.

For tens of thousands of years, the Milky Way – the band of light that stretches across the dark night sky – would have been one of the most striking sights in the sky on a dark night once you stepped away from the fire.

Milky Way — via Unsplash

Ironically, the same technological prowess that has allowed us to explore the farthest reaches of the Universe also gives us cities and electric lights. I always ask whether my students have seen the Milky Way for themselves with another show of hands and each year quite a few of them disclose that they have not. I encourage them (and everyone) to find chances to sit out under a cloudless, moonless sky and take in the full majesty of the heavens as it slowly reveals itself to you as your eyes adapt to the dark.

In the meantime, though, we make do with a projector and a darkened lecture theatre.

It was over 400 years ago that Galileo pointed the first, small telescope at the sky. In that moment the apparent clouds of the Milky Way revealed themselves to be composed of many individual stars. By the 1920s, we understood that our Sun is a star and that the Milky Way is a collection of billions of stars, with our Sun inside it. But the single biggest question in astronomy in 1923 — which, with hindsight, became known “Great Debate” — was whether the Milky Way was an isolated island of stars in an infinite and otherwise empty ocean of space, or if it was one of many such islands, sprinkled across the sky.

In other words, for Hubble and his contemporaries the question was whether our galaxy was the galaxy, or one of many?

More specifically, the argument was whether nebulae, which are visible as extended patches of light in the night sky, were themselves galaxies or contained within the Milky Way. These objects, almost all of which are only detectable in telescopes, had been catalogued by astronomers as they mapped the sky with increasingly capable instruments. There are many kinds of nebulae, but the white nebulae had the colour of starlight and looked like little clouds through the eyepiece. Since the 1750s these had been proposed as possible galaxies. But until 1923 nobody knew with certainty whether they were small objects on the outskirts of our galaxy – or much larger, far more distant objects on the same scale as the Milky Way itself.

To human observers, the largest and most impressive of the nebulae is Andromeda. This was this object at which Hubble had pointed his telescope in October 1923. Hubble was renowned for his ability to spot interesting details in complex images [1] and after the photographic plate was developed his eye alighted on a little spot that had not been present in an earlier observation [2].

Hubble’s original guess was that this was a nova, a kind of star that sporadically flares in brightness by a factor of 1,000 or more, so he marked it and a couple of other candidates with an “N”. However, after looking back at images that he had already taken and monitoring the star through the following months Hubble came to realise that he had found a Cepheid variable – a star whose brightness changes rhythmically over weeks or months.

Stars come in a huge range of sizes and big stars are millions of times brighter than little ones, so simply looking a star in the sky tells us little about its distance from us. But Cepheids have a useful property [3]: brighter Cepheids takes longer to pass through a single cycle than their smaller siblings.

Imagine a group of people holding torches (flashlights if you are North Americans) each of which has a bulb with its own distinctive brightness. If this group fans out across a field at night and turns on their torches, we cannot tell how far away each person simply by looking at the resulting pattern of lights. Is that torch faint because it is further from us than most, or because its bulb is dimmer than most? But if each person were to flash the wattage of their bulbs in Morse Code we could estimate distances by comparing their apparent brightness (since distant objects appear fainter) to their actual intensity (which is encoded in the flashing light).

In the case of Cepheids they are not flashing in Morse code; instead, nature provides us with the requisite information via the time it takes for their brightness to vary from maximum to minimum and back to a maximum again.

Hubble used this knowledge to estimate the distance to Andromeda. While the number he found was lower than the best present-day estimates it was still large enough to show that it was far from the Milky Way and this roughly the same size as our galaxy.

The immediate implication, given that Andromeda is the brightest of the many nebulae we see in big telescopes, was that our Milky Way was neither alone nor unique in the Universe. Thus we confirmed that our galaxy was just one of an almost uncountable number of islands in the ocean of space – and the full scale of the cosmos yielded to human measurement for the first time, through Hubble’s careful lens on a curious star.

A modern image (made by Richard Gentler) of the Andromeda galaxy with a closeup on what is now called “Hubble’s star” taken using the (appropriately enough) Hubble Space Telescope, in the white circle. A “positive” image from Hubble’s original plate is shown at the bottom right.

Illustration Credit: NASA, ESA and Z. Levay (STScI). Credit: NASA, ESA and the Hubble Heritage Team (STScI/AURA)


[1] Astronomers in Hubble’s day used a gizmo called a “Blink Comparator” that chops quickly between two images viewed through an eyepiece, so objects changing in brightness draw attention to themselves by flickering.

[2] In most reproductions of the original plate I am hard put to spot it all, even more so when it is projected on a screen in a lecture theatre. A bit of mild image processing makes it a little clearer, but it hardly calls attention to itself.

snap1.jpg
snap2.jpeg

[3] This “period-luminosity law” had been described just 15 years earlier by Henrietta Swan Leavitt and it is still key to setting the overall scale of the Universe.

December 18, 2023

Jordan EllenbergShow report: Bug Moment, Graham Hunt, Dusk, Disq at High Noon Saloon

I haven’t done a show report in a long time because I barely go to shows anymore! Actually, though, this fall I went to three. First, The Beths, opening for The National, but I didn’t stay for The National because I don’t know or care about them; I just wanted to see the latest geniuses of New Zealand play “Expert in a Dying Field”

Next was the Violent Femmes, playing their self-titled debut in order. They used to tour a lot and I used to see them a lot, four or five times in college and grad school I think. They never really grow old and Gordon Gano never stops sounding exactly like Gordon Gano. A lot of times I go to reunion shows and there are a lot of young people who must have come to the band through their back catalogue. Not Violent Femmes! 2000 people filling the Sylvee and I’d say 95% were between 50 and 55. One of the most demographically narrowcast shows I’ve ever been to. Maybe beaten out by the time I saw Black Francis at High Noon and not only was everybody exactly my age they were also all men. (Actually, it was interesting to me there were a lot of women at this show! I think of Violent Femmes as a band for the boys.)

But I came in to write about the show I saw this weekend, four Wisconsin acts playing the High Noon. I really came to see Disq, whose single “Daily Routine” I loved when it came out and I still haven’t gotten tired of. Those chords! Sevenths? They’re something:

Dusk was an Appleton band that played funky/stompy/indie, Bug Moment had an energetic frontwoman named Rosenblatt and were one of those bands where no two members looked like they were in the same band. But the real discovery of the night, for me, was Graham Hunt, who has apparently been a Wisconsin scene fixture forever. Never heard of the guy. But wow! Indie power-pop of the highest order. When Hunt’s voice cracks and scrapes the high notes he reminds me a lot of the other great Madison noisy-indie genius named Graham, Graham Smith, aka Kleenex Girl Wonder, who recorded the last great album of the 1990s in his UW-Madison dorm room. Graham Hunt’s new album, Try Not To Laugh, is out this week. ”Emergency Contact” is about as pretty and urgent as this kind of music gets. 

And from his last record, If You Knew Would You Believe it, “How Is That Different,” which rhymes blanket, eye slit, left it, and orbit. Love it! Reader, I bought a T-shirt.